Hey guys, I am looking for a llm which is fine tuned completely on java projects, specifically for java spring boot. But maybe no one has curated a large enough dataset out of them to create an llm and the web search capabilities of these models are highly limited due to the limited context window.
There is an infinitely large amount of documentation and other data available for java. The ecosystem in too big. But I’ve observed that even GPT 4 sucks at creating java projects from scratch. GitHub Copilot (which claims to be using gpt4) never gives an executable code that runs without errors even for the most basic spring . But it is able to generate any kind of python code required (especially for training predefined models) that works without errors in most cases.
I’ve observed similar issues with deepseek coder and code llama 34b. Looks like the datasetd used for training these models had a much larger amount of python samples as compared to java.
Basically from what I understand that all LLMs are just a set of insanely advanced mathematical functions that is able to read the user input, read it’s own generated output and predict the next most suitable token (which is actually represented as number inside the model) and this makes generalization a very tough task. The thing is I don’t need the model to be able to give a buggy code in a million different languages, I need a decent executable code in just one. I also don’t need the model to know the irrelevant general knowledge information, just basic English to understand the problem and advanced java programming skills. I know this is the idea behind mixture of experts approach but I feel they’re still to broad. The java ecosystem is so vast that I can define tasks for 8 or even 16 different experts for java ecosystem only. The issue I’m currently facing with LLMs is not overfit, but underfit.
I also tried custom GPTs on the GPTs marketplace have few GPTs for java that gave marginally better results as compared to normal gpt4, but not satisfactory.
I haven’t tried mistral medium but not really much hope from it because it has reported slightly lower results on benchmarks as compared to gpt4.
Haven’t tried auto gpt either because it has higher investment requirements than just hiring an average java freelancer. Although google is providing a free API (free for any kind of personal use case) that seems to be better than gpt-3.5 but I don’t really see any open source AutoGPT repository giving option to use Gemini Pro instead of the gpt 3.5 or gpt 4. I understand that claude or mistral can’t be used since they aren’t available for free through an api but why not Gemini? My technical abilities are limited to edit those codes to use Gemini instead of gpt which also means that I can’t really create one from scratch.
It’d be really helpful if anyone can knows about a java fine tuned llm or an AutoGPT framework that is using gemini or any other llm available for free through an api (I highly doubt that). If you’ve enough technical expertise to change the code in any of the existing GitHub repositories to use Gemini instead of OpenAI, please do it. It’d be really helpful for not just me, but the entire community. OpenAI is just too expensive to be usable at the moment.
Please let me know if this is not an appropriate platform for this post, I’ll take it down. Thanks.
submitted by /u/anonymous_abc99
[link] [comments]