Automating the Generation of Hardware Component Knowledge Bases
Luke Hsiao, Sen Wu, Nicholas Chiang, Christopher Ré, and Philip Levis
Published in Proceedings of the 20th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’19), June 2019.
Abstract
Hardware component databases are critical resources in designing embedded systems. Since generating these databases requires hundreds of thousands of hours of manual data entry, they are proprietary, limited in the data they provide, and have many random data entry errors. We present a machine-learning based approach for automating the generation of component databases directly from datasheets. Extracting data directly from datasheets is challenging because: (1) the data is relational in nature and relies on non-local context, (2) the documents are filled with technical jargon, and (3) the datasheets are PDFs, a format that decouples visual locality from locality in the document. The proposed approach uses a rich data model and weak supervision to address these challenges. We evaluate the approach on datasheets of three classes of hardware components and achieve an average quality of 75 F1 points which is comparable to existing human-curated knowledge bases. We perform two applications studies that demonstrate the extraction of multiple data modalities such as numerical properties and images. We show how different sources of supervision such as heuristics and human labels have distinct advantages which can be utilized together within a single methodology to automatically generate hard- ware component knowledge bases.
Data (WWW), Paper (1MB)
BibTeX entry
@inproceedings{hack-lctes19, author = "Luke Hsiao and Sen Wu and Nicholas Chiang and Christopher Ré and Philip Levis", title = "{Automating the Generation of Hardware Component Knowledge Bases}", booktitle = "{Proceedings of the 20th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES ’19)}", year = {2019}, month = {June} }





Login