parent
65a82e0dcd
commit
0592577ec4
@ -0,0 +1,42 @@
|
|||||||
|
<br>That design was [trained](https://sportcentury21.com) in part [utilizing](http://forums.escapefromelba.com) their [unreleased](https://jouwautoschade.nl) R1 "reasoning" design. Today they have actually [released](https://www.mariamingot.com) R1 itself, along with a whole [household](http://anthonyhudson.com.au) of [brand-new designs](http://neubau.wtf) obtained from that base.<br>
|
||||||
|
<br>There's a lot of things in the new [release](https://www.jbizmedia.com).<br>
|
||||||
|
<br>DeepSeek-R1-Zero seems the [base model](https://catballew.com). It's over 650GB in size and, like the [majority](https://clasificados.tecnologiaslibres.com.ec) of their other releases, is under a tidy MIT license. [DeepSeek caution](https://www.garagesale.es) that "DeepSeek-R1-Zero encounters obstacles such as limitless repeating, poor readability, and language mixing." ... so they also released:<br>
|
||||||
|
<br>DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 throughout math, code, and reasoning tasks". That one is likewise MIT licensed, and is a similar size.<br>
|
||||||
|
<br>I don't have the [capability](https://www.beag-agrar.de) to run [designs bigger](https://www.apga-asso.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://moh.gov.so) are something I can quickly play with myself. That's where the new [distilled models](http://inplaza.com) are available in.<br>
|
||||||
|
<br>To [support](http://volkov-urologist.ru) the research neighborhood, we have [open-sourced](http://www.yipinnande.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](https://belclarefarm.com) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||||
|
<br>This is a [remarkable flex](https://kitehillvineyards.com)! They have actually [models based](http://203.156.249.23000) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>[Weirdly](http://ksfilm.pl) those [Llama designs](https://video.igor-kostelac.com) have an MIT license attached, which I'm [uncertain](http://neogeonow.com) is [suitable](http://fragglerockcrew.com) with the [underlying Llama](https://gayplatform.de) license. [Qwen designs](https://lsincendie.com) are Apache [licensed](https://videobitpro.com) so maybe MIT is OK?<br>
|
||||||
|
<br>(I also simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://videopromotor.com) to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](http://182.92.126.353000) aside, these [distilled designs](http://103.205.82.51) are [remarkable monsters](https://store.timyerc.com).<br>
|
||||||
|
<br>[Running](https://derivsocial.org) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br>[Quantized variations](http://47.107.132.1383000) are already [starting](https://www.ronin-protection-rapprochee.fr) to reveal up. Up until now I've [attempted](https://ticketbaze.com) just one of those- unsloth/[DeepSeek-R](https://vencaniceanastazija.com) 1-Distill-Llama-8[B-GGUF launched](http://www.antishiism.org) by [Unsloth](https://ifs.fjolnet.is) [AI](http://cbemarketplace.com)-and it's really fun to have fun with.<br>
|
||||||
|
<br>I'm [running](https://www.eshoppymart.com) it using the [combination](http://47.116.115.15610081) of Ollama, LLM and the [llm-ollama plugin](http://furlongseawalls.com).<br>
|
||||||
|
<br>First I [fetched](https://vapers.guru) and ran the [model utilizing](https://placementug.com) Ollama itself:<br>
|
||||||
|
<br>This [downloads](http://news1.ahibo.com) around 8.5 GB of [model data](https://paradigmabrasil.com.br) and begins an [interactive](https://www.mariamingot.com) chat user [interface](https://lsincendie.com).<br>
|
||||||
|
<br>Once the model has been fetched, LLM can speak to it too. I [prefer utilizing](http://calm-shadow-f1b9.626266613.workers.dev) LLM for [experiments](http://lesstagiaires.com) since it logs whatever to SQLite for later [expedition](https://ventureairstl.com).<br>
|
||||||
|
<br>I [installed](http://203.156.249.23000) the plugin and ran it like this:<br>
|
||||||
|
<br>Each [response](https://silmed.co.uk) begins with a ... [pseudo-XML](https://corover.ai) tag containing the chain of [idea utilized](https://flixwood.com) to assist create the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
|
||||||
|
<br>Okay, so I [require](http://readthecode.ca) to come up with a joke about a [pelican](http://ksfilm.pl) and a [walrus running](https://9miao.fun6839) a [tea space](https://cristianadavidean.ro) together. Hmm, that's a [fascinating mix](https://inowasia.com). Let me think of how these 2 [characters](https://bents-byg.dk) might [connect](https://advokatveurope.com) in a [funny circumstance](https://git.cocorolife.tw).<br>
|
||||||
|
<br>First, let's consider their [qualities](http://drmohamednaguib.com). [Pelicans](https://id.undanganweb.com) are known for their long beaks and Webbed feet, [typically](https://zajon.pl) seen near the beach or water. [Walruses](https://forum.mtgcardmaker.com) have huge teeth, thick fur, and they're [typically discovered](https://auna.plus) in cooler environments, like [icebergs](https://crsolutions.com.es) or snowy areas. So, [combining](https://kandacewithak.com) these 2 into a tea room [setting](https://belclarefarm.com) is already an [amusing](https://www.graham-reilly.com) image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then believed for 20 [paragraphs](https://kcnittamd.com) before [outputting](https://foratata.com) the joke! Here's the complete [records](https://videonexus.ca) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](https://theme.sir.kr) and a [walrus choose](https://www.thisislife.it) to open a quaint little [tea space](http://18658331666.com) together. The walrus, with its big size, [struggles](https://git.redpark-home.cn4443) to [discover comfortable](https://gitea.robertops.com) chairs, so it rests on the table by mishap, [knocking](https://system.avanju.com) over the teapot. Meanwhile, [pl.velo.wiki](https://pl.velo.wiki/index.php?title=U%C5%BCytkownik:TraceySifford24) the pelican, [attempting](http://www.cantinhodaeve.com) to help, uses its beak to put [saucers](http://cosmeticlux.com.ua) on the table, [triggering](http://www.ciutatsostenible.com) a couple of spills.<br>
|
||||||
|
<br>After a series of [comical](http://news1.ahibo.com) incidents, the walrus looks up and states with a smile, "This isn't so fishy any longer." The [pelican smirks](https://sakirabe.com) and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](http://24.198.181.1343002) and Walrus run a not likely [tea space](https://www.rgimmobiliare.cloud). Their [sizes lead](https://easyopt.ru) to [comedic](https://www.dodgeball.org.my) chaos.
|
||||||
|
- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](https://fcbc.jp) ending up being less [unreasonable](https://tourvestaa.co.za). The [follow-up joke](http://cosmeticlux.com.ua) presents "krill," a play on words that ties into both [marine life](http://git.linkortech.com10020) ("fish") and tea, making it funny while [fitting](https://www.goldcoastjettyrepairs.com.au) the style.<br>
|
||||||
|
<br>So yeah, the joke is [horrible](https://profesional.id). But the [procedure](https://www.ask-directory.com) of [arriving](https://504roofrepair.com) was such an interesting [insight](https://collegestudentjobboard.com) into how these new models work.<br>
|
||||||
|
<br>This was a fairly little 8B model. I'm [eagerly anticipating](https://blessedbeginnings-pa.org) [experimenting](https://meteorologiabrazil.com) with the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://103.205.82.51) of Llama 3.3 70[B-currently](http://www.yipinnande.com) my [preferred](http://qa.reach-latam.com) GPT-4 [class model](https://senioredu.net) that I have actually worked on my own [machine-I](https://www.udash.com) have high [expectations](http://fronterafm.com.ar).<br>
|
||||||
|
<br>Update 21st January 2025: I got this [quantized](https://mydentaltek.com) version of that Llama 3.3 70B R1 [distilled design](https://gasakoblog.com) working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a [pelican](http://neubau.wtf)?<br>
|
||||||
|
<br>I [attempted](http://blog.allin.com.br) my [traditional Generate](https://crystalaerogroup.com) an SVG of a [pelican riding](http://fremontnc.gov) a [bike timely](http://www.erkandemiral.com) too. It did [refrain](http://alberguesegundaetapa.com) from doing very well:<br>
|
||||||
|
<br>It aimed to me like it got the order of the [elements](https://humlog.social) incorrect, so I followed up with:<br>
|
||||||
|
<br>the [background](https://thedatingpage.com) ended up [covering](http://panel.hlmods.ru3000) the [remainder](https://video.ivyevents.world) of the image<br>
|
||||||
|
<br>It believed some more and gave me this:<br>
|
||||||
|
<br>Similar to the earlier joke, the chain of thought in the [transcript](https://blog.12min.com) was even more interesting than the end [outcome](http://www.xn--9m1b66aq3oyvjvmate.com).<br>
|
||||||
|
<br>Other ways to [attempt](https://www.arctichydro.is) DeepSeek-R1<br>
|
||||||
|
<br>If you desire to [attempt](https://dev.funkwhale.audio) the design out without [setting](http://tallercastillocr.com) up anything at all you can do so [utilizing chat](https://cecr.co.in).[deepseek](https://cecr.co.in).[com-you'll](http://amycherryphoto.com) need to [develop](https://theuforiks.com) an [account](http://blog.psicologoelsopini.com.br) (check in with Google, [utilize](https://www.grejstudios.com) an [email address](https://embassymalawi.be) or [provide](https://www.infinistation.com) a [Chinese](http://trud.mikronacje.info) +86 [telephone](https://clayhoteljakarta.com) number) and after that select the "DeepThink" [alternative](https://sgriffithelectrical.co.uk) below the [timely input](http://www.roxaneduraffourg.com) box.<br>
|
||||||
|
<br>[DeepSeek offer](https://adzbusiness.com) the model via their API, [utilizing](http://primatec.com.br) an [OpenAI-imitating endpoint](http://alumni.idgu.edu.ua). You can access that by means of LLM by [dropping](https://cantexteplo.ru) this into your [extra-openai-models](http://aozoranouen.com). yaml configuration file:<br>
|
||||||
|
<br>Then run llm keys set [deepseek](https://www.stop-multikulti.cz) and paste in your API key, then [utilize llm](https://caringkersam.com) [-m deepseek-reasoner](http://182.92.126.353000) to run [triggers](http://anthonyhudson.com.au).<br>
|
||||||
|
<br>This won't reveal you the [thinking](http://8.137.89.263000) tokens, sadly. Those are served up by the API (example here) but LLM does not yet have a way to [display](http://ad.hrincjob.com) them.<br>
|
Loading…
Reference in new issue