Handwriting Data and Applications
CS F453/F680 Final Project
By Andrew Mattson | December 8th, 2023
Preface
Note to Lawlor: I am okay with you posting some of my pictures to advertise Robotics and 3D Printing!
This two-for-one project meets the final project requirements for two different classes. As such, the applications for this project vary widely. The SVG and GCODE data formats were implemented for CS F453, Robotics and 3D Printing, with Professor Orion Lawlor. The JSON, prompt-based and cloud-stored data was implemented for CS F680, Data Engineering (Topics in Computer Science), with Professor Arghya Das.
Project Overview
Objective:
To allow machine-copied handwriting, and to provide a modern and accessible platform for collecting, storing, and provisioning dynamic handwriting data for applications in AI, data analysis, and digital handwriting technologies.
Key Features:
User-friendly OpenCV drawing program for data collection
Diverse output formats: GCODE, SVG, and JSON
Integration with MongoDB Atlas cloud database for data accessibility
An exploration of AI handwriting synthesis using custom data
The Need for Modern Handwriting Data
Existing handwriting databases often fall short in addressing the dynamic and precise nature of handwriting, limiting their application in fields such as progressive handwriting analysis and personalized writing synthesis. Taking inspiration from this YouTube video and this Reddit post, this project emerges as a solution to this challenge, offering a comprehensive system that captures diverse handwriting data in user-friendly formats. These formats allow users to complete a variety of tasks, including handwriting replication, synthesis, and web display. The JSON data is extremely flexible and applicable to many fields of study.
Project Components
Data Collection: OpenCV Program
Prompt Mode: Users generate JSON data by responding to specific prompts.
Non-Prompt Mode: Users can draw freely, saving data as SVG or GCODE files.
Data Formats: GCODE, SVG, and JSON
GCODE: Enables precise replication for CNC writing robots or modified 3D printers.
SVG: Lightweight graphics format for web display and digital signatures.
JSON: Comprehensive and dynamic data for statistical analysis and AI model training.
Accessibility and Provisioning: MongoDB Atlas
JSON handwriting data stored in MongoDB Atlas cloud database.
Public, download-only account provided for database access through MongoDB Compass:
Username: public
Password: GimmeWritingData
Connection string: mongodb+srv://public:GimmeWritingData@HandwritingData.lbgarej.mongodb.net/
Specialized program (atlas_data.py) for seamless data uploading (with an authorized account; contact me for one).
Exploring the GCODE Format
For the portion of this project that emphasizes robotics and 3D printing, the GCODE format is most directly applicable. Below is a slideshow showcasing how I configured my 3D printer to write using the GCODE files generated by the OpenCV drawing program, and a quick exploration into handwriting synthesis. The handouts specified in the slides were showcased in class and will be posted next to/below the slides. The OpenSCAD/STL files for the pen mount are located on the GitHub page, too.
Below is an embed of an STL file generated by my program:
Results and Future Work
All primary project goals, including data creation and provisioning, were successfully implemented. The project explored a stretch goal involving AI handwriting synthesis, showcasing promising results with room for refinement. Future work could focus on enhanced data collection, AI model refinement, user interface enhancements, and additional database security measures.
Conclusion
In conclusion, this project establishes a robust framework for modern handwriting data, addressing the limitations of existing databases. The choice of MongoDB, along with GCODE, SVG, and JSON formats, provides versatility for diverse applications. With successful implementation and exploration of potential applications, this project marks a significant stride in the intersection of traditional handwriting and contemporary data technologies.