[HN Gopher] Forking Chrome to turn HTML into SVG ___________________________________________________________________ Forking Chrome to turn HTML into SVG Author : fathyb Score : 222 points Date : 2022-11-13 17:06 UTC (5 hours ago) (HTM) web link (fathy.fr) (TXT) w3m dump (fathy.fr) | lifthrasiir wrote: | > What if we could also vectorize 2D <canvas> elements controlled | by JavaScript? Turns out, Chromium has this capability built-in | for printing: | | I'm very surprised to hear this. So printing, either to PDF or to | actual printers, may reveal more information about what was drawn | to the canvas than normal display, especially if no effort has | been made to remove overdrawn paint records. That can have an | interesting, if only hypothetical, consequence... | tyingq wrote: | Pretty sure canvas.toDataURL() is or was a fingerprinting | method. | andybak wrote: | Being fair, if anyone is sending anything to the client and | assumes it's not visible then they are fair game. | | I just hope the code around password entry fields is carefully | audited. That's all on the client. | kevincox wrote: | Yes, but there is more than that there. What if I as the | client try to print a page or export to PDF. I think that | there is nothing sensitive visible on the page so I share the | result. It turns out that there was actually sensitive info | in the canvas that was not visible due to something like | overdraw. | | As a simple example imagine that an image is drawn to the | canvas and then blacked out. You wouldn't expect that the | saved PDF may contain those as separate layers. | | Of course this highlights an existing issue with complex | formats. You need to be very careful before sharing complex | documents. | version_five wrote: | There's a story from years ago (I couldn't find it) about | some government or legal documents having info redacted, but | whoever did it just used some pdf editing tool to draw black | boxes over the redacted parts, so all the info was still in | the pdfs. | | Edit: I found this but I'm not sure it's the one I'm | remembering: https://www.techdirt.com/2014/01/28/new-york- | times-suffers-r... | perth wrote: | This happened with the Maxwell redacted court docs | mk_stjames wrote: | This is exactly the case. I've done conversions before where it | was possible to see and extract underlying, hidden elements, | that were not visible or even detectable in the rendered | webpage in a browser. | | This is actually a somewhat common method when it comes to a | bit of corporate sleuthing.. anytime you see a pretty website | with vector-y graphics, maybe engineering-drawing | representations.. if the data hasn't been stripped completely | or redrawn you can extract information that otherwise people | would assume unknowable. | | In a recent example... I did this on a startup company's page | involving a product where they had a CAD-like side view drawing | of one of their products... but the base file (in this case it | was an SVG) driving the page actually contained multiple hidden | views of the same product and other products and at the 'real' | precision of what likely was a DXF export from a CAD program, | given to the web team. This allowed a critical dimension of an | unannounced product to be precisely determined (to three | significant figures) which was a spec that had not been | publicly released... | aidos wrote: | Totally. We see architectural drawings that go through a number | of revisions and it's not uncommon for designers to simply | cover a whole section with a white box and then draw on top of | it. | | Also within PDFs (and svgs) you normally clip the area you're | going to draw into to bound it (sort of like overflow:hidden) | and anything outside of that doesn't display, but it's still | there and accessible. | | I marvel more at the fact that software is capable of figuring | out all the occlusions so you can print the stuff on a plotter. | Cad drawings have up to 2M individual vectors in them. Its | impressive that it works at all to be honest. | [deleted] | steren wrote: | Much cleaner than my hack of Chrome -> PDF -> Inkscape -> SVG | | https://labs.steren.fr/2020/05/08/screenshot-as-svg/ | cjr wrote: | ha, I'm also guilty of using this method on https://urlbox.io | to power our SVG screenshots. | | To be honest, it works quite well, but there are quite a few | bugs in chromium's pdf rendering, especially when it comes to | determining the correct page width to apply media queries to, | which sometimes affects the accuracy of these SVG's. | SigmundA wrote: | Reminds me of https://github.com/gliffy/canvas2svg at a different | level of abstraction. | | I believe PDF.js incorporated some form canvas2svg to try and get | a SVG backend working which would allow high resolution printing | to PDF but not sure where that's at. I believe printing through | PDF.js is blurry due to memory constraints since with normal | canvas pdf pages just end up as bitmaps sent to the printer. | | SVG ends up staying vector through Chromiums print pipeline | resulting in much less memory usage while having much higher dpi | final output. I would imagine this is due to SVG being turned | into Skia drawing commands that end up as PDF that then gets | printed through PDFium? | pornel wrote: | It'd be wonderful if this could be integrated into the browser | and the OS to provide SVG screenshots. | hedora wrote: | I'd love to see some sort of caching proxy that did this for news | stories, etc. | | Basically, convert everything to an archival format, then I'll | browse the archive instead of whatever adversarial server side / | javascript junk the site is serving. | bawolff wrote: | Well both pdf and svg support javascript (albeit pdf is | extremely limited) | ccouzens wrote: | If you could proxy the page to SVG without Javascript, couldn't | you also proxy the page to HTML without Javascript? | | Either way, you'd probably want your proxy to wait to for any | onload Javascript to run before snapshotting the page. | metayrnc wrote: | Can someone give some example usecases? I am curious as to how | this is used. Thank you. | commotionfever wrote: | if it works how i think it does, this could be really nice to | cook up some infographics in a css framework like tailwind. | then make some svgs for a github readme | | for example i made this one[1] with tailwind but i just ended | up taking a png screenshot | | [1] | https://github.com/sentriz/socr/blob/master/.github/socr.png | danielvaughn wrote: | IMO the use case is limited but interesting. The most obvious | would be product screenshots for landing pages, although | typically design tools handle that well enough. | | I'm currently building a web-app for building web pages, and | I'd love for the user to be able to view a thumbnail gallery of | all the pages they've built. This tool would allow me to build | a zooming feature pretty easily. | | Outside of those two, I'd imagine the use cases are fairly | limited. | btown wrote: | The thing about having access to the Skia render graph is | that all of a sudden you're no longer limited to product | screenshots and screen recordings. Imagine a pipeline where | you can export someone's interaction session with a site, | pixel-perfect, into DaVinci Resolve or Blender or Unity as a | fully annotated DOM-advised render node hierarchy, with | consistent node identities over time, of every rendered | element on the page as it changes across frames. That's _way_ | more powerful than just pixels. | | Imagine flying through your site in 3D (or even VR) with full | control over timing, being able to explode and un-explode | your DOM elements as they transition into being - the type of | thing that only Apple would do for their WWDC demos with | dedicated visualization teams. | | The start is to be able to see the rendering engine as a | generator for not just raster data over time, but vector data | over time. Of course, there's a lot of work to do from there, | but this is the core leap. | Scalene2 wrote: | Great for screenshots to render in a video. | convolvatron wrote: | if we can reduce the size of the basis footprint for a browser | implementation, we can more easily produce new browsers (i.e by | implementing a fully general Path, and font rendering) | mrkramer wrote: | That would be cool if actually converting HTML to SVG would | save you bandwidth and all the rest that goes with web | requests. Imagine a web browser that only supports SVG and | converts all HTML to SVG then when browsing the web you would | only look at screenshots of websites and webpages. This would | be something like read-only browser. It is already | possible[0] tho but it is not enabled by default on Chrome | nor it is exclusive feature. | | [0] https://frankgroeneveld.nl/2021/08/24/most-underused- | browser... | GranPC wrote: | I do something similar - but using the Print command and | converting the PDF to SVG - to import websites into Blender for | flashy animations. This allows me to neatly animate things | in/out, and zoom into details without pixelation. | yvoschaap wrote: | I tried something similar like this to render thumbnail of | websites (at a very small file-size). E.g. | https://twitter.com/yvoschaap/status/1446397003316047872 | mrkramer wrote: | Isn't this something like archive.ph is doing? Snapshotting | and screenshotting websites. I'm referring both to you and | the op. | dj_gitmo wrote: | https://archive.ph/1NNZr | | That looks like a Web Archive (WARC) and a PNG screen shot. | I think you can make a screenshot with CasperJS. The WARC | can be created by wget. | mrkramer wrote: | Yea you are right about PNG but wrong about WARC. | Archive.ph doesn't use WARC. | dredmorbius wrote: | What does it use, if you know? | | Source? | mrkramer wrote: | Their FAQ says: https://archive.ph/faq#:~:text=Which%20pa | rts%20of,of%201024x.... | | Wikipedia says: https://en.wikipedia.org/wiki/Archive.tod | ay#:~:text=Web%20pa.... | | So I assumed they doesn't use it but idk for sure. | marginalia_nu wrote: | How small are you getting them? I'm straight up | screenshotting websites (e.g. | https://search.marginalia.nu/screenshot/245804). Seem to come | in at on average 17 Kb, based on a sample size of 550K | screenshots. | codetrotter wrote: | That is super neat! Did you end up having any | users/customers? | mk_stjames wrote: | I've done this for a project long ago, incredibly lazily, by | using chrome/chromium to PDF and piping to a PDF to SVG tool. | There are a few PDF to SVG pathways, I remember it using Cairo | and the whole thing was quick and consistent. | crazygringo wrote: | That was my first thought as well. | | I'm genuinely curious if there are any advantages in | Chrome->SVG as opposed to Chrome->PDF->SVG. | | Are there any graphical effects (e.g. produced by CSS, like | blurry text shadows or something) that PDF can't render without | falling back to bitmap but SVG can? | | Or is there other data that SVG usefully preserves that PDF | discards, such as actual source text strings used for text? (As | opposed to PDF where getting text out, e.g. when copying to | clipboard, usually involves a lot of ugly "reverse | engineering".) | femto113 wrote: | I think the path is more clearly thought of as HTML+CSS -> | display list -> *. The display list is some abstract | definition of what needs to be drawn by a renderer. In theory | anything that fully describes all possible operations works, | including bespoke things like SkPicture or general purpose | graphical languages like SVG or PostScript. In practice | there's never a single language that can describe everything, | because display capabilities evolve and new operations are | added all the time (e.g. advanced typography features for | fonts). PDF can cover a really broad set of use cases, but it | also wasn't designed as an intermediate format (it was | closely tied to the PDF reader) so it's easier to get into | than out of. SVG is possibly a better candidate, as it is | already used effectively as an intermediate representation | (e.g. D3.js "renders" to SVG). | aidos wrote: | Neither pdf or svg do text layout. They're both pretty | similar really, though the pdf spec is really deep and broad | to cater for a million things. | | My advice to everyone re pdfs is to crack them open by | running `mutool clean -d file.pdf` And opening in a text | editor. They're just a tree (well, graph, I guess) of obvious | objects. | | Ps: mutool convert does a good job of converting from pdf to | svg in a fairly faithful way. | DrewADesign wrote: | Do you mean there's no _dynamic_ text layout? Svg and pdf | have perfect text placement capability, but I 've never | even looked to see if it supports defining broadly | applicable rules for text presentation. | aidos wrote: | I mean there's no layout engine to do things like | wrapping and line height etc. Everything is explicitly | positioned. | | PDF seems a bit more bonkers because you render text as | strings of glyphs and the conversion back to text is an | afterthought. There's a ToUnicode map that says "glyph 8 | in the embedded font is an 'X'" but that's there for copy | pasting / searching - not for rendering. PDFs are built | to render glyphs at positions. | | Edit: to go full meta, there are Type3 fonts where each | glyph itself is defined as a PDF graphics stream. Which | actually leads you in to what's inside a font. Guess | what? lots of them look just like PDFs inside, because | the glyphs are defined in postscript. Fonts are PDFs | kinda grew up together, and once you start digging into | them the similarities are striking. | ccouzens wrote: | If you print to PDF, you'll have the page's print css | applied. And it is probably paginated. | | If you go direct to SVG the capture will use the screen css | and not be paginated. | mk_stjames wrote: | Yes, this. In the project I was doing, using chromium as a | command-line interface I remember having options to do the | pagination to a custom resolution, which I used to define a | render 'window' as if the browser screen was on something | like a 1600x18000 monitor. so I had the entire webpage | displayed like a full scroll without page breaking like it | would have if you just printed a PDF from Chrome. And this | allowed me to then extract this giant full length vector | graphics result of diagrams and text into a single SVG that | was perfectly spaced and rendered in the aspect ratio I | wanted. | cjr wrote: | It's also possible to emulate screen media queries[0] so | that the pdf output uses the regular screen css. | | [0] https://chromedevtools.github.io/devtools- | protocol/tot/Emula... | aidos wrote: | I've been down a bit of this rabbit hole before. We work with | PDFs, svgs, fonts and chromium too. While I don't have any need | for this tool itself, I'd highly recommend flicking through this | article as a nice overview of the graphics / font pipeline. | bscphil wrote: | Semi-related, there's this browser extension that somehow manages | to mangle HTML into SVG with pretty good accuracy. | https://addons.mozilla.org/en-US/firefox/addon/svg-screensho... | | I do stuff like this (vector representations of the DOM) for | taking screenshots. Why? | | 1. High resolution screenshots are great when you're sharing from | a low resolution device, or when you need to scale them up. I've | seen enough crappy screenshots of Twitter in YouTube videos to | last me the rest of my life. | | 2. If your device does sub-pixel anti-aliasing, then your | screenshots all have noticeable color fringing around their text. | The text rendering is done well before the data hits the buffer | that the screenshot is capturing. A fun party trick is to | identify someone's OS based purely on a screenshot of some text | on a webpage. | | 3. On Linux (and maybe elsewhere, IDK), color correction (e.g. | gamut mapping) is done (in X11) before the pixels get to the | buffer that you capture. So with most screenshot tools, you end | up capturing a bunch of distorted colors which you then have to | map back to sRGB if you want them to look right in color | calibrated software. | | You can frequently get away with printing a PDF and then | rendering that out to a large PNG. In some cases, though, | figuring out how to set the page size to match what you seen on | the screen can be near-impossible, and more importantly in | Firefox there's no way to disable print media CSS when printing a | PDF. (You can do this in Chromium.) If you need to edit the image | afterwards or want to put it on a website or something, this is | far easier to do with the SVG format than with PDF. | jancsika wrote: | > Recently, an experimental SVG back-end has been added to Skia. | | That's curious. | | Anyone know why? | return_to_monke wrote: | While I am not a skia person, an use case I could imagine is | (flutter) web apps. | | Flutter currently has 2 ways to run something on the web: 1. | CanvasKit. Primarily, this uses webgl. Though, the app has to | download a kind of webGl runtime on the first launch, iirc. If | the browser does not support openGl, it will use Skia with a | Canvas frontend, leading to blurry and poor performance results | 2. webRender. This is flutter's way of trying to make a HTML | DOM, but its not that great either. It's inconsistent with the | rest of the flutter implementations, and has performance issues | because it's not really mature/optimized and has a virtual Dom. | | I think an exciting use case would be something like 1. Instead | of the blurry image and bad performance of canvas redrawing, it | might try to manipulate an SVG in the browser. This is pure | speculation tho, correct me if I'm wrong. | TheRealPomax wrote: | Calling it "recently" is a bit of a misnomer. The | "experimental/svg/model/..." content was added almost five | years ago. | simpleintheory wrote: | Interesting. Wonder how easily it would be to generalise this-- | turn into an API that gives out some image data that could be in | turn converted to PDF, SVG, PNG, you name it... though not sure | how the data would be structured though | imhoguy wrote: | You can make PDF or PNG from SVG. | fathyb wrote: | I had a lot of people reach out this week-end for PDF support, | so I'm planning on implementing it with PNG support this week. | Thanks to Skia, it should just require a few lines of code. | justinclift wrote: | PDF or PNG? It's not clear from your comment. :) ___________________________________________________________________ (page generated 2022-11-13 23:00 UTC)