"AI Happenings" 8/27
a lot of news this week, interesting paper on LLM sensitivity and RL is still relevant?
OpenAI
GPT 3 5 Turbo Fine Tuning And API Updates
OpenAI has announced the availability of fine-tuning for GPT-3.5 Turbo, allowing developers to customize models for their specific use cases. Early tests have shown that a fine-tuned version of GPT-3.5 Turbo can perform as well as or even better than base GPT-4 on certain tasks. Fine-tuning with GPT-3.5 Turbo is more cost-effective, supports larger prompt sizes, and can be combined with techniques like prompt engineering and information retrieval for improved performance..
OpenAI Partners With Scale To Provide Support For Enterprises Fine Tuning Models
OpenAI and Scale are partnering to offer fine-tuning for OpenAI's advanced AI models like GPT-3.5 Turbo and upcoming GPT-4. This collaboration allows companies to customize and use these powerful models on their proprietary data securely. Scale's expertise in enterprise AI and data enrichment, along with their Data Engine, will provide additional benefits to customers looking to optimize their AI deployments.
Adept
No news this week.
Scale AI
OpenAI Scale Partnership GPT 3 5 Fine Tuning
OpenAI has announced a strategic partnership with Scale, providing GPT-3.5 fine-tuning for enterprises. The partnership combines OpenAI's base model with Scale's expertise to create custom state-of-the-art models for specific business needs. The fine-tuned GPT-3.5 model has already shown performance improvements for companies like Brex in automating expense memo generation.
DeepMind
No news this week.
Anthropic
No news this week.
FAIR
Code Llama Large Language Model Coding
The customer complaints regarding the product have been increasing steadily over the past month. Many customers have expressed dissatisfaction with its performance and durability. The company should address these concerns promptly to maintain customer trust and prevent further decline in sales.
The prompt is incomplete as there is no information provided to summarize. Please provide more context or specific points to summarize.No news this week
Stability
Stability AI SDXL Gets Boost From Nvidia Tensor Rt
The Stability AI team has partnered with NVIDIA to improve the speed of their text-to-image generative AI product, Stable Diffusion XL (SDXL), by integrating a high-performance optimization framework. The collaboration has allowed them to generate high-definition images in just 1.47 seconds, with the optimized NVIDIA TensorRT model being significantly faster and more efficient than the baseline model. The speed improvement has broader implications for democratizing generative AI, making it more accessible and affordable for individuals and organizations to harness the power of AI for innovation and creativity.
Cabrita: closing the gap for foreign languages
The strategy of training the model from scratch in a specific language or domain serves two essential purposes: i) enhancing performance in the particular linguistic or domain context, and ii) ensuring effective tokenization. The main limitation inherent to this approach lies in the associated cost, which can reach six to seven-digit dollar values, depending on the model size and the number of parameters involved. The main solution to overcome the cost challenge is to rely on available pre-trained models, which, despite recent advancements such as the LLaMA and LLaMA-2 models, still demonstrate inefficiency for certain specific domain problems or prove ineffective in scenarios involving conversational memory resources, given the large number of tokens required to represent text. To overcome this issue, we present a methodology named Cabrita, which, as our research demonstrates, successfully addresses the performance and efficient tokenization problem, all at an affordable cost. We believe that this methodology can be applied to any transformer-like architecture model. To validate the study, we conducted continuous pre-training exclusively using Portuguese text on a 3-billion-parameter model known as OpenLLaMA, resulting in a model named openCabrita 3B. The openCabrita 3B also features a new tokenizer that results in a significant reduction in the number of tokens required to represent the text. In our assessment, for few-shot learning tasks, we achieved similar results with this 3B model compared to a traditional continuous pre-training approach as well as to 7B models English pre-trained models.
Keep reading with a 7-day free trial
Subscribe to 10x to keep reading this post and get 7 days of free access to the full post archives.