As an agile AI solution builder, we work with many LLMs. We use the latest AI frameworks and libraries. We built AppCoder to generate Python code for generative AI projects. Our goals include speeding up AI development, advancing creativity, and improving the code that powers our AI projects. We use AppCoder for ourselves and when building alongside our enterprise clients. You can try it yourself.
THE RESULT
Interplay-AppCoder LLM, a revolutionary new high performing code generation model
Scoring high on ICE benchmark test
The ICE methodology provides metrics for Usefulness and Functional Correctness as a baseline for scoring code generation. Read more about the ICE methodology in this paper.
We utilized GPT4 to measure the metrics and provided a score between 0-4. This is the test dataset and Jupyter Notebook we used to perform the benchmark.
Usefulness: addresses whether the code output from the model is clear, presented in a logical order, and maintains human readability and whether it covers all functionalities of the problem statement after comparing it with the reference code.
Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
Model NAme | Usefulness | Functional Correctness (0-4) |
Interplay AppCoder LLM | 2.968 | 2.476 |
Wizard Coder | 1.825 | 0.603 |
What we are doing