I joined Google in November 2008. I'm based in Mountain View, CA.
Most of my initial work was on the Borg cluster management system that allocates resources to Google's internal compute jobs. Here's a couple of external articles about that system; you can find out much more on my publications page:
- Inside the Hacker Mind: John Wilkes on Google Omega. Cade Metz, Wired. April 2013.
- Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon. Cade Metz, Wired. March 2013.
In October 2016, I joined the Resource And Infrastructure Information Technology (RIOT!) team that automates the fulfillment and provisioning of computing resources to all of Google: we're on the hook to keep up with Google's exponential growth in demand, as efficiently as possible. At our scale, that's quite a big deal.
I was at HP Labs for nearly 26 years. Here's a short summary of what I got up to there.
Just before I left HP, I helped initiate and found Open Cirrus, an open cloud-computing research testbed designed to support research into the design, provisioning, and management of services at a global, multi-datacenter scale. The original slogan: "PlanetLab for data centers".
The topic area that started my interest in self-managing systems was storage. I founded and led the HP Labs Storage Systems Program for several years.
Together, we worked on the auomatic design and configuration of enterprise-scale storage systems, with an emphasis on storage-area networks (SANs) and disk arrays. Our goal was to make such systems much easier to manage.
Accomplishing this required work in:
- Capturing high-level goals for the storage system, expressed as Quality-of-service based Service Level Agreements (QoS-SLAs)
- Automatically determining the appropriate design and configuration of a storage system to meet those goals, and the placement of data in it. This includes tools to design network topologies for high speed Storage-Area Networks (SANs) such as FibreChannel.
- Online system management tools that allow data to be moved around on the fly, while it's being accessed, and help to close the loop, allowing a storage system to be completely self-managing.
- Mechanisms and languages for describing storage system requirements, capabilities, designs, and deployments.
- Scalable storage system architectures that can deliver big-system reliability and performance with small-system price and flexibility.
A retrospective on this work was published in Operating Systems Review, January 2009.
Storage, data, and information systems
In 2007-2008 Christopher Hoover, Beth Keer, Pankaj Mehra, Alistair Veitch and I wrote a book on the storage stack, from hardware up to modern information systems. I was the editor and publisher). It grew out of a technology briefing for the HP board of directors. A couple of thousand copies were given away at HP storage conferences, and to students in a CMU class. For about $15, you can buy a copy from Amazon.com.
Before this, I was active in several areas, including:
- Helping HP's Storage Systems Division in Boise develop the HP AutoRAID technology.
- Network architecture (the Hamlyn sender-based message model).
- Operating systems (e.g., the Brevix project).
Many of the papers published by myself and other members of the research groups in which I have worked over the years are available online*, together with some of the software and disk I/O traces* that we and other researchers used.
(*) The HP Labs links now point to a version on the wayback machine archive.
I'm not a mathematician, but my Erdős number is 3, according to AMS's collaboration-distance tool: Eric Anderson → Michael Saks → Paul Erdős.
I am named as an inventor on 40 issued US patents. The full list can be found here.
- Principal Software Engineer, Google, 2010.
- HP Fellow, 2002.
- ACM Fellow, 2002 (member since 1981).
- SNIA Outstanding contribution award for work on the SNIA shared storage model, October 2001.
- PhD, University of Cambridge, 1984: Workstation design for distributed computing.
- British Computer Society Wilkes Award for the best paper published in 1984 in the Computer Journal by an author or authors aged under 30. The award is named for Maurice Wilkes, director of the Computer Laboratory in Cambridge c. 1947–1980.
The paper was: The Rainbow workstation [PDF 5.3MB]. A. J. Wilkes, D. W. Singer, J. J. Gibbons, T. R. King, P. Robinson, and N. E. Wiseman. Computer Journal 27(2):112-120, May 1984. DOI: 10.1093/comjnl/27.2.112. © 1982 British Computer Society.
- The Rainbow workstation itself (my PhD thesis research) was given the British Computer Society Technology Award for 1982.
- Diploma in Computer Science, University of Cambridge, 1979.
- BA in Natural Sciences, University of Cambridge, 1978.