[HN Gopher] Video-LLaVA ___________________________________________________________________ Video-LLaVA Author : tosh Score : 146 points Date : 2023-11-21 17:31 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | bobosha wrote: | This is a very cool project! Kudos to the authors for being on | top and keeping the features coming. Appears to be feature- | competitive with OpenAI's GPT-4V `vision` endpoint. | whimsicalism wrote: | Researchers seem very comfortable sticking "Apache 2.0" licenses | all over their foundation model finetunes. | | This model is absolutely not Apache 2.0 in reality (it's a Vicuna | finetune nevermind the sourcing of the finetune dataset) and I | would use it for business at your peril. | Der_Einzige wrote: | Fine-tuning the weights scrambles the original representations | (sometimes more than others depending on training settings, but | if you train the text encoder it certainly will). All authors | have to do is not be honest about the original model it was | fine-tuned on in a world where lawyers start to come down on | this. | | I see no issue for businesses using it. | whimsicalism wrote: | I don't know - it sounds like your default assumption is that | there is no issue because businesses can commit copyright | infringement/fraud and not be caught, I am not a lawyer so I | can't comment on the merits of the approach. | | Generally I think it is difficult for businesses to break the | law given that any one of the members might defect on you. | | Also I suspect that the logprobs for various sequences would | reveal which foundation model you used. | yeldarb wrote: | Looks like the Vicuna repo is Apache 2.0 also[1]. | | What's the interpretation of copyright law that would prevent | the code being Apache 2.0 based on the source of the fine- | tuning dataset? | | [1] https://github.com/lm-sys/FastChat | whimsicalism wrote: | Not quite: fastchat is the inference code which is Apache 2.0 | but distinct from the model artifact. If you look at the | model [0] it is licensed as non-commercial. | | But why? | | Well for one, Vicuna is a Llama finetune, which already | excludes it from being Apache 2.0. It's also finetuned on OAI | data which is... questionable in terms of license (don't | think you can really legally license a model trained on OAI | output as Apache 2.0 - although OAI doesn't really play by | its own rules so who knows) | | [0]: https://huggingface.co/lmsys/vicuna-13b-v1.3 | yeldarb wrote: | Which part of copyright law are model weights governed by? | (Or, if not by copyright law, what's the legal basis that | would let you choose a "license" for model weights?) | dartos wrote: | Tbf the llama license allows for small businesses usage. | | But also these models aren't watermarked or anything (not that | watermarking really works) so it's kind of the wild west | kyriakos wrote: | I honestly have no idea what this project is about. It may be | because I'm completely out of the loop regarding LLMs but | still... | fkyoureadthedoc wrote: | I had no idea from the name, but the README does a good job of | explaining what it's about. Even has a nice video demo. | abrichr wrote: | Open source question answering over videos: | | > With the binding of unified visual representations to the | language feature space, we enable an LLM to perform visual | reasoning capabilities on both images and videos | simultaneously. | kyriakos wrote: | Thanks | btbuildem wrote: | The related paper is here: https://arxiv.org/pdf/2311.10122.pdf | | I think the TL;DR is "it can tell what's in the video and | 'reason' about it" | astrea wrote: | Side note: Why does every GitHub readme look like a children's | book these days? Emojis, big colorful graphics, gifs, cute | project logo, etc. Makes me feel awkward trying to read about a | serious topic with the ":o" emoji staring in my face. I'm just | waiting for the air horns to start blaring and a dancing cat to | slide across my screen. | chankstein38 wrote: | Because you're dealing with humans and sometimes humans don't | behave in the same way you apparently expect everyone to? These | aren't massive billion dollar corps they're some engineer or | group of engineers doing something interesting to them. | | In this case it seems related to a university, so these are | students and researchers at a university. Some of them are very | likely qualifiable as kids to us old people. | | Not sure why it's such a bother to you, does a topic need to be | cold and black and white for it to further our technological | research? (That's hypothetical because this repo, for instance, | absolutely furthers our tech abilities while also being in a | more friendly non-academic format.) | Implicated wrote: | The closer to discord a community is the more things look this | way, at least that's my interpretation. | devmor wrote: | Emojis are part of the common vernacular now, and software | development is a mainstream career instead of a siloed off | nerd-haven. | j45 wrote: | Because it's more inviting than to just people who like text | alone. | | https://shuiblue.github.io/forcolab-uoft/paper/IST2022-emoji... | dartos wrote: | I love that this exists | j45 wrote: | Me too. | | Not to say a study can't often be found for most | viewpoints. | geysersam wrote: | Couldn't agree more! | dymk wrote: | Do you use syntax highlighting? | dvngnt_ wrote: | you could also ask why does serious writing often avoid adding | big colorful graphics if it looks better. | rajamaka wrote: | Demo just errors out unfortunately ___________________________________________________________________ (page generated 2023-11-21 23:00 UTC)