Fault detection and diagnosis (FDD) represents one of the most active areas of research and commercial product development in the buildings industry. This paper addresses two questions concerning FDD implementation and advancement 1) What are today's users of FDD saving and spending on the technology? 2) What methods and datasets can be used to evaluate and benchmark FDD algorithm performance? Relevant to the first question, 26 organizations that use FDD across a total 550 buildings and 97 M sf achieved median savings of 8%. Twenty-seven FDD users reported that the median base cost for FDD software, annual recurring software cost, and annual labor cost were $8, $2.7 and $8 per monitoring point, with a median implementation size of approximately 1300 points. To address the second question, this paper describes a systematic methodology for evaluating the performance of FDD algorithms, curates an initial test dataset of air handling unit (AHU) system faults, and completes a trial to demonstrate the evaluation process on three sample FDD algorithms. The work provided a first step toward a standard evaluation of different FDD technologies. It showed the test methodology is indeed scalable and repeatable, provided an understanding of the types of insights that can be gained from algorithm performance testing, and highlighted the priorities for further expanding the test dataset.