Hacker Newsnew | past | comments | ask | show | jobs | submit | trilogic's commentslogin

Adriano sei forte uei! (Meaning: Adriano You´re cool Hey), an Italian LEGEND very smart rich in all and humble, (you too Claudia) from somebody you used to know :)

Celentano and Moro are many things, but humble is not one of them... He's a preachy Catholic bore, completely detached from reality; and she's deluded that they're still big stars. They used to be something, but they've long since said anything worth listening to.

Adriano is charismatic (public confirms it, they pay for his songs and movies). Claudia Mori beside being a beautiful actress and singer, was the public relation heart, which paid off very well. When I met them they were funny and happy, also by induction the typical VIP behavior. Everyone has a bad day sometime but Overall,they get a very positive sentiment. Still listening to: in tanto il tempo se ne va ....

Don´t give up on native agents, best logic will prevail. The open weights will show the real deal.

Would be ironic that I am now replying to a bot, while everyone else assumes the contrary.

It is a great browser, thank you. Would you consider to add an option that signals scam websites and especially the ones that do not give the option of denying cookies or making it helly difficult being so in non compliance with gdpr. That is some data that you will be glad to sell, we get a better service and Eu warriors make some money on it.

>Anthropic is currently in talks with investors to raise money at a $900 billion valuation, which would push it ahead of OpenAI.

How you go from 380 to 900 billions in a month, I am very curious? So now Anthropic is evaluated 900 billions! Journalism this days is worse than my kids social media channel. Totally, I believe you, go for it, is just one more zero bro. Everyone Brace for Impact.

Let´s do it also, Breaking News: HUGSTON in talks with investors now Evaluated at 1 Billion Euro.


> How you go from 380 to 900 billions in a month, I am very curious?

Mythos Marketing.


Well having "Mythos" at our offices (maybe not so good but 90% or maybe even better) would it be worth 1 Billion, Just saying!

They should create a giant AI LLM model trained on that data. Then settle with some form of payment like others did (learning from the best LOL). Then I don´t understand why once bought a book can´t be uploaded online? If you are not engaging in a commercial activity I don´t see the issue, the book was bought is not a state secret. By that logic the cookie trackers, that literally track/spy you and that buy and sell your data for profit and more, illegally should be priority, not some books that educate people.

> They should create a giant AI LLM model trained on that data.

It's interesting that Anna's could have kept the data to themselves and had a major advantage in training LLMs, either creating their own or charging possibly billions to large LLM companies.


Qwen 3.6 35B (finetuned) is so good that it became standard open weights for everyday use. Is not far at all from proprietary models if you give it tools, skills and agents etc, it can actually finish the job. (Thank you Qwen team, appreciated). Using opensource now we can definitely rely to design from scratch very complicated architecture and build pretty fast the full pack. Wish to see Europe AI unleashed, wake up.


> Is not far at all from proprietary models if you give it tools, skills and agents etc,

I use Qwen 3.6 27B, the dense version of this model which is slightly better.

I don't agree that it's close at all. Maybe for some small, easy tasks, but not for working on real codebases. It's amazing for something I can run at home, but the difference between it and Opus or GPT-5.5 is huge.


Really, how so? Because we work with codebases daily, can you tell us a concrete example! In our case we work in consumer hardware (ish), 10 million ctx (1 million output, 1 million input proven, sometimes it loops or breaks at over 500k ctx byt at ~17tps linear). IT can read the full codebase, unleash agents, and write in disk editing and patching files creating a full app in 3-4 minutes. IT can do Web search and Rag pretty fast, it understands and fix the user query, sys prompts and adapt/fix them if needed on the fly. I am wondering what more do you do?


Edit: Forgot to mention that it can process images and pdf, and 100s of other files, it can even create presentations in code or mermaid, svg, charts js etc. Here a basic version of it: https://hugston.com/chat


how do you do 1mio context with qwen3.6 27b, that only supports 256k? and what hardware would you run that on? 2 * 3090 is afaik currently at max 256k context.


You can get all the Qwen 3.x models up to ~1 million tokens using YaRN with llama.cpp.[0]

Personally I am using `--no-context-shift` and feeding in context back in on failure at the harness level.

I have 2x1080ti + 1xTitanV that have a full 262,144 tokens context on 262,144 tokens with `-sm tensor` at 62.04 t/s which isn't so bad.

But I also have a 1x3090 running unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL at 41.89 t/s but with only 130k context, but if you have a modular programming style both work pretty well.

But play with YaRN if you really need it.

[0]https://qwen.readthedocs.io/en/v3.0/run_locally/llama.cpp.ht...


How can you get it to run at 41 t/s? I also have a single 3090 and even with MTP can't break 20 t/s.

HEre's my setup:

  llama-server
  --port 9999
  --model /MODELS/LLMs/Qwen3.6-27B-UD-Q4_K_XL.gguf
  --ctx-size 128000
  --threads 12
  --flash-attn on
  --device CUDA0
  --jinja
  --gpu-layers 52
  --mmproj /MODELS/LLMs/Qwen3.6-27B-mmproj-F16.gguf
  --cache-type-k q8_0
  --cache-type-v q8_0
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 --repeat-penalty 1.0 --presence-penalty 0.0
  --spec-type draft-mtp --spec-draft-n-max 2
(I'm not filling out 100% of the VRAM, as I have other stuff I need it for.)


(Note UPDATED config)

Ya, if you are using the CPU it may slowdown quick.

This may be a bit huge and overcomplicated, on this host I am running it on a AMD Ryzen 7 5700G so that I can use the APU to dedicate the 3090.

    podman run --device nvidia.com/gpu=all -d -v llama_qwen3.6mpt:/root/.cache -p 8080:8080 local/llama.cpp:full-cuda --server \
    -hf unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL \
    -ngl 99 \
    --ctx-size 131072 \
    --no-mmproj-offload \
    --no-context-shift \
    --kv-unified \
    --spec-type draft-mtp \
    --spec-draft-n-max 6 \
    --spec-draft-p-min 0.75 \
    -fa on --jinja --no-mmap \
    --cache-ram -1 \
    --no-warmup -np 1 \
    -n 32768 \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --temp 0.6 \
    --min-p 0.00 \
    --top-k 20 \
    --top-p 0.95 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.05 \
    --fit off \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking":true}' \
    --prio 3 \
    --poll 100 \
    --port 8080 \
    --host 0.0.0.0

I am just building the container with:

     podman build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
And here is the logs from a 'make me a flappy bird program in python' webui prompt.

     prompt eval time =     105.86 ms /    19 tokens (    5.57 ms per token,   179.47 tokens per second)
       eval time =  100549.41 ms /  4608 tokens (   21.82 ms per token,    45.83 tokens per second)
      total time =  100655.28 ms /  4627 tokens
     draft acceptance rate = 0.47215 ( 3408 accepted /  7218 generated)
I am down to ~25.54 t/s with a 95% full context.


That config looked too complicated, getting rid of the --prio 3 and --poll 100, setting the draft-n-max to now recommended values, etc... kicked it up to 61 t/s

I think that was all about some earlier crashes.

     podman run --device nvidia.com/gpu=all -d -v llama_qwen3.6mpt:/root/.cache -p 8080:8080 local/llama.cpp:full-cuda --server \
    -hf unsloth/Qwen3.6-27B-MTP-GGUF:UD-Q4_K_XL \
    -ngl 99 \
    --ctx-size 128000 \
    --no-mmproj-offload \
    --no-context-shift \
    --kv-unified \
    --spec-type draft-mtp \
    --spec-draft-n-max 2 \
    --spec-draft-p-min 0.75 \
    -fa on --jinja --no-mmap \
    --cache-ram -1 \
    --no-warmup -np 1\
    -n 32768 \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --temp 0.6 \
    --min-p 0.00 \
    --top-k 20 \
    --top-p 0.95 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.05 \
    --fit off \
    --reasoning on \
    --chat-template-kwargs '{"preserve_thinking":true}' \
    --port 8080 \
    --host 0.0.0.0


Yeah, having even a little bit in the CPU tanks the t/s...

But thanks. I've learned a few more configurations to tinker with.


You can increase the context window beyond its max trained context using RoPE scaling[0] which will require more VRAM.

But you can increase your context window for the same VRAM by quantizing the KV cache with FP8 (double the context) or TurboQuant (more than double)[1].

0: https://medium.com/@leannetan/extending-context-length-with-...

1: https://docs.vllm.ai/en/latest/features/quantization/quantiz...


We managed to increase the ctx for whatever llm model that is GGUFED, here the experimental tests: https://www.reddit.com/r/Hugston/


I've had the opposite experience, and have built multiple fantastic applications with Qwen3.6 27b. What quantization have you tested with?


Similarly I haven't seen Qwen 27B as remotely competitive with Opus, at least Q4 hooked up to Claude Code. What harness are you using?


As funny as it may sound a q4_k_m well converted and quantized properly (and finetuned, impereative) would do the job. The 27b it may be good but is heavy, it burns the hardware. I personally prefer the 397B if I am stucked and can´t progress, it can still run with 7 tps. Now with the Mtp (multitoken prediction) it nearly double the speed ( reached 82tps today with the 35b 100000ctx). I recommend it you give it a try.


> not for working on real codebases

You don't pick just one model to "work on real codebases". You use a very advanced model to plan, and a not-very-advanced, cheaper, faster model to execute planned tasks. This saves money and speeds up work. This is the guidance from Anthropic & OpenAI.


It's 3.7-max; max was never open-weighted before. I don't see any smaller models in that tweet.


For coding it’s really bad. Writing is ok, chat is good. It’ll get better but it’s not that close yet


Bad is mystifying. Unassisted but for handing it a pile of PDFs of relevant academic papers and my initial codebase, I had hermes agent based on qwen-3.6 27B implement karatsuba multiplication of characteristic-2 polynomials in C++ in an existing codebase with an internal field arithmetic library. It correctly found the 'obvious' optimizations using the field properties. Then I had it implement the recursive halfgcd algorithm for these polynomials using it.

It wrote extensive test cases and validated them with mutation testing (per my standard instructions)-- took many tries getting the algorithms right but with the tests handy it found and fixed the errors.

It's inconceivable to me to call it bad!


Depends on the language and harness, I guess.

It works really well for me, at least for Python and JavaScript, with swival.dev as a harness.


You should probably disclaimer that you're the author of swival.dev, but nice project :)


Do you have a good resource on how to finetune a model like Qwen? I am curious to try it out.


Here is a dataset you can choose from: https://huggingface.co/datasets/Avtrkrb/combined-reasoning-o... Get a 10000 samples from it according to your needs and go for it. The key (in my opinion) is not cutting the Sequence Length among other things. Whatever traditional finetuning repo will do, if your hardware supports it Unsloth is faster.


Unsloth has good resources


Can you share the GGUF for this specific success story? I'd like to try it for myself.


It is a step in the right direction, but time is a key factor also. Do not take it easy, hurry up.

Imperative to stop the data leak this month (Maj 2026).


Outrageous, Malta is Europe, it needs an European provider, this is an European security issue. Malta need to align to European values as by agreement with EU.


You're not going far enough.

Closed software has no place in government whatsoever. It should all be open and/or locally run, and ideally GPL/AGPL licensed.

It is another level of stupidity/moral failing to use closed software supplied from outside your nation, though.


There is nothing money can´t buy, moral included. That do not change stupidity, so I am with you there.


A quick search of revenue of google and facebook (in billions) 2015-2020. Is that so hard to understand that there is an entire economy wrapped around AI/IOT? Didn´t Europe learn anything from historic data? Google Facebook 2015 $67.80B* $17.93B Google segment revenue (Alphabet restructuring). Total Alphabet: $74.98B 2016 $81.30B $25.76B Google segment revenue. Total Alphabet: $89.55B 2017 $100.10B $40.65B Google segment revenue. Total Alphabet: $109.46B 2018 $128.90B $55.84B Google segment revenue. Total Alphabet: $136.82B 2019 $143.90B $70.70B Google segment revenue. Total Alphabet: $161.86B 2020 $152.70B $85.97B *Google segment revenue. Total Alphabet: $182.53B

Europe gdp for same years (in trillions): 2015 16.89 2016 16.88 2017 17.88 2018 18.89 2019 19.31 2020 17.42 Now by simple math a healthy gdp growth is around 4%, so just by creating and/or backing up 2 similar companies (in Europe) will revenue ~2.5% of the total entire European gdp. What is going on, are the European Leaders sabotaging our economy on purpose?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: