Sponsors
Sponsor Products
Fileserver SSD Cache
posted by Saham Ali  on Aug. 9, 2016, 9:10 a.m. (3 years, 11 months, 25 days ago)
34 Responses     2 Plus One's     1 Comments  
Hey guys,Just wanted to poke the hive mind and see what,if anything, people are doing to utilize SSD flash storage as a cache layer on top of their file servers?In the past I have used LSI Cachecade, Which helped, only with files that are used over and over again, but new files read or written, wouldn't necessarily take advantage of the SSD. Great for DB transactions and same file read and writes, not for new frames and new data.
Is there any roll your own solution, other than going with ZFS where you can have a cache pool of storage, which presents itself to the file system, where all initial read and writes take place, and then delayed writes to disks happen, or even a way to have a folder maybe using directory junction to appear as if its in the same space as the root of the file system but actually sit on the SSDs?
Thanks!

Thread Tags:
  discuss-at-studiosysadmins 

Response from Anonymous @ Sept. 19, 2016, 8:45 a.m.

At a previous place we spent some time trying to make an NFS caching box with fs-cache (an Avere "on the cheap") but found stability was dreadful. We never solved this and bought an Avere cluster which worked perfectly from day one.

 

On 2016-09-16 18:16, Ben De Luca wrote:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html
  free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini...    We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.    Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients.   

Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.    Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable. Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.       Todd Smith Head of Information Technology   soho vfx |  99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8 office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com
@Saham Sorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:
 

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob

 

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe
 
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

 

--
Simon Burley
RPS Film Imaging Ltd
Mobile: 07702 732 655

1 Plus One's     0 Comments  
   

Response from Peter Devlin @ Aug. 9, 2016, 1:55 p.m.
The hive mind tends to produce better results with more info up front. What is your data access model? Block or file? Stream or random? Underlying network transport protocol (NFS vs SMB vs serial)? Numbers of clients? Client and server types? Budget constraints?
We have been using CacheCade for a while on silo CentOS servers rather than central filestore but it's a very noughties cheap-and-cheerful solution. Works well IF you are file-based as opposed to block-based and IF your cache is sized such that a cache flush cannot be caused by large files e.g. FX sims or stream of 9k frames. When you get away from that pattern you need something else e.g. ZFS via good HBAs for block.
>From what I've seen on vendor roadmaps, broadly-speaking the question of SSD-based caching for spinning disk is likely to be moot by this time next year. Cost per GB for SSD and flash is predicted to be so far through the floor that spinning disk seems set to go the way of the dodo. Add smart tiering algorithms that are now available in certain filesystems. Net result is that I'd be reluctant to pony up for anything to do with spinning disk in the next 12 months. That buzzword SDS now seems likely to become reality and make such investments look unwise.

Sit on your hands is never really good advice though. Our primary filestore is Isilon so the bulk of my caching needs are met on the cluster. However I do have a pending project with a work profile that seems likely to overtax my Isilon unless I'm prepared to use S series nodes. Therefore on the "roll your own" front I'm also interested in something that might sit in front of Isilon but work well with it. I've been noodling solutions but most of them, in the short term, look like SSD buckets with script-driven preloading of 'hot' data i.e. not at all satisfactory except for edge cases.
So, what's your requirement?

--
Thanks,

Peter Devlin
Head of ITTel+44 (0)141 572 2802




A X I S

Axis Productions Limited

7.1 Skypark 1, Elliot Place

Glasgow, G3 8EP

axisanimation.com


-----------------------------------------------------------------------

Axis Animation (Axis Productions Ltd)
Registered in Scotland: SC306712

Registered Office: Suite 7-1, The Skypark, 8 Elliot Place, Glasgow G3 8EP

1 Plus One's     1 Comments  
   

Response from Saham Ali @ Sept. 21, 2016, 11:15 p.m.

Each node was a dual xeon, 24 cores no HT.
By default we ran 0 or AUTO threads, and we reduced it to 2 and 4.


On Sep 21, 2016 8:33 PM, "Todd Smith" <todd@sohovfx.com> wrote:
If you don't mind my asking, how many threads were you running, versus how many did you reduce to?


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@Todd
Yes, I did reduce the threads, it helped somewhat. Not alot.
We just have to adjust the threads on certain jobs due to the fact the spherical nodes are very CPU intensive, and require all threads.


On Sep 21, 2016 6:56 PM, "Todd Smith" <todd@sohovfx.com> wrote:

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Todd Smith @ Sept. 21, 2016, 8:40 p.m.
Oh, ok, guess we should have started with what the NAS was to begin with!
That's definitely something you will need to spend money on.  
Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

The current filer is just a server 2012 with 12 x 2 TB WD black drives. When the situation happens the disk queue on the volume goes to 50-80. Flash is gonna be the short term fix till we can get  a proper NAS with caching in place.


On Sep 21, 2016 7:53 PM, "Bruce Dobrin" <brucedobrin@hotmail.com> wrote:

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Todd Smith @ Sept. 21, 2016, 8:35 p.m.
If you don't mind my asking, how many threads were you running, versus how many did you reduce to?


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@Todd
Yes, I did reduce the threads, it helped somewhat. Not alot.
We just have to adjust the threads on certain jobs due to the fact the spherical nodes are very CPU intensive, and require all threads.


On Sep 21, 2016 6:56 PM, "Todd Smith" <todd@sohovfx.com> wrote:

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 21, 2016, 8:30 p.m.

@Todd
Yes, I did reduce the threads, it helped somewhat. Not alot.
We just have to adjust the threads on certain jobs due to the fact the spherical nodes are very CPU intensive, and require all threads.


On Sep 21, 2016 6:56 PM, "Todd Smith" <todd@sohovfx.com> wrote:
@Saham, did you take my advice and scale back the number of threads Nuke is using on the render nodes? That is my primary recommendation. Nuke runs relatively quickly even with 1 thread per machine, additionally this will ease load on your file servers. Yes Nuke jobs will require more processing time, however, given that you are bringing your NAS to a standstill, this will be most likely be balanced by an improvement in NAS response times to the render nodes.
Additionally you may ask artists to pre comp out finished parts of the comp that will not be revisited. This will bring down the number of read nodes needed in the actual comp, and thus the necessary read traffic from the NAS to the render node. We've found many "new" comp artists aren't familiar with comp'ing on large NAS based networks (full rez, no proxy, no pre comp all the time), so it may be something that needs to be enforced for them to get used to it.
Both of these feats require $0 investment.
Cheers,
Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@Todd,
The issue we have currently is the READs are killing the performance on the server, we are writing the Writes to the local node, and then copying back to the server after. Too many large files needing to be read by nodes, scaled by 40 nodes. and then artists trying to use that same server to continue comping.
Really am considering just building a massive Flash storage box, and creating a symlink to the outputs directory for the comp and CG outputs.
On Fri, Sep 16, 2016 at 11:46 AM, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 21, 2016, 8:30 p.m.

The current filer is just a server 2012 with 12 x 2 TB WD black drives. When the situation happens the disk queue on the volume goes to 50-80. Flash is gonna be the short term fix till we can get a proper NAS with caching in place.


On Sep 21, 2016 7:53 PM, "Bruce Dobrin" <brucedobrin@hotmail.com> wrote:
NOt sure if its really going to help that much to make home-grown a giant SSD array. normally the bottle neck seems to be head contention, the filer heads cant process all the IOPS rather than the speed of the disk. We ran into 41000 IOPS limit on our older HNAS HUS150 with 3090, our new G400 with 4080 heads seems tobe good up to 60K IOPS. in both cases, the disks didn't seem to even be breathing hard.
My $.02



From:Todd Smith
Sent:Wednesday, September 21, 2016 3:56 PM
To:studiosysadmins-discuss@studiosysadmins.com

@Saham, did you take my advice and scale back the number of threads Nuke is using on the render nodes? That is my primary recommendation. Nuke runs relatively quickly even with 1 thread per machine, additionally this will ease load on your file servers. Yes Nuke jobs will require more processing time, however, given that you are bringing your NAS to a standstill, this will be most likely be balanced by an improvement in NAS response times to the render nodes.
Additionally you may ask artists to pre comp out finished parts of the comp that will not be revisited. This will bring down the number of read nodes needed in the actual comp, and thus the necessary read traffic from the NAS to the render node. We've found many "new" comp artists aren't familiar with comp'ing on large NAS based networks (full rez, no proxy, no pre comp all the time), so it may be something that needs to be enforced for them to get used to it.
Both of these feats require $0 investment.
Cheers,
Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@Todd,
The issue we have currently is the READs are killing the performance on the server, we are writing the Writes to the local node, and then copying back to the server after. Too many large files needing to be read by nodes, scaled by 40 nodes. and then artists trying to use that same server to continue comping.
Really am considering just building a massive Flash storage box, and creating a symlink to the outputs directory for the comp and CG outputs.
On Fri, Sep 16, 2016 at 11:46 AM, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments  
   

Response from Todd Smith @ Sept. 21, 2016, 7 p.m.
@Saham, did you take my advice and scale back the number of threads Nuke is using on the render nodes?  That is my primary recommendation.  Nuke runs relatively quickly even with 1 thread per machine, additionally this will ease load on your file servers.  Yes Nuke jobs will require more processing time, however, given that you are bringing your NAS to a standstill, this will be most likely be balanced by an improvement in NAS response times to the render nodes.
Additionally you may ask artists to pre comp out finished parts of the comp that will not be revisited.  This will bring down the number of read nodes needed in the actual comp, and thus the necessary read traffic from the NAS to the render node.  We've found many "new" comp artists aren't familiar with comp'ing on large NAS based networks (full rez, no proxy, no pre comp all the time), so it may be something that needs to be enforced for them to get used to it.
Both of these feats require $0 investment.
Cheers,
Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@Todd,
The issue we have currently is the READs are killing the performance on the server, we are writing the Writes to the local node, and then copying back to the server after. Too many large files needing to be read by nodes, scaled by 40 nodes. and then artists trying to use that same server to continue comping.
Really am considering just building a massive Flash storage box, and creating a symlink to the outputs directory for the comp and CG outputs.
On Fri, Sep 16, 2016 at 11:46 AM, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
         ,\|//,
        ( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

  .ooO
  (    )     Ooo.
--\  ( ------(    )-------------------
    \_)       )  /
              (_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from William Sandler @ Sept. 20, 2016, 10:10 a.m.
I read through the SSA archives and it seems you were/are using one of those ASRock motherboards with 16+ SATA ports. Have you benchmarked these against LSI HBA's? (Not RAID controllers).

Also still curious about your use of non-ecc RAM with ZFS. How do you "stomach" it?
You mention OC'ing; is the CPU the bottleneck at this point? Are you mostly serving NFS, SMB, or SAMBA clients? I ask because I'm curious how efficiently SAMBA 4.x is using multiple cores/threads these days. (We use the integrated SMB 2.1 on Solaris 11.3).
Lastly, just out of curiosity, what case/housing are you using?



William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Tue, Sep 20, 2016 at 9:31 AM, Jorg-Ulrich Mohnen <content@studiosysadmins.com> wrote:

The Jim Henson Studios has used rgw Stratoflash posted below for almost two years now, in its entirety, to home build flash storage systems that are quite unique. I architected the shabang and BOM and materials build in 2014/2015 with Brocade and Samsung. I believe you need to understand the flow of electrons (data) across your own facility, right down to the last speck and drop. Outsourcing storage and network topographies is a BIG no-no in this busness, as is placing the onus of responsibitilies on third parties is asking for it. You may feel safer in your job by calling out to a storage vender when their is an issue, and the onus is then not on you, but no one is helped with such a belief or strategy. Gone will soon be these days when YOU are not IT (old school....).

And this stuff is not rocket science.

(1) We use Brocade 10G/40G switches so that Clients are each connected with a 10G card directly to three 48 port 10G Brocades. Our core or primary switch is a 24 port 40G storage switch. These can certainly be costly, but there is good reason. Each switch is bundled to the core at dual 2x40G or 80G. Each server is to the core at 40G. Each and EVERY client is 10G to the switches, either via a SANLINK2/TH2 for Apple, or windows/linux enjoy the 10G PCI-E cards.

(2) Each storage node actually can support 24x 2TB Samsung Flash - totalling 48TB of RAW Flash. We gcluster the nodes with some secrete sauce, and we obviously have tweaked the shite out of Debian Linux inside each storage node, and tqeaked Zpooling and ZUK'ing and Cach'ing. But this too can be met with most peoples linux experience.

(3) When I mean we cover ALL VFX ad Post productions of these storage nodes, I mean pretty much all of it. Davinci, Premier, Final Cut, AVID, Nuke, Maya, Max, GPU Redshift Rendering, PhotoShop, et cetera EVEN A RUNTIME PRODUCITON MOCAP STAGE. No issues what so ever, no slowness. EVEN a fully outgrown deployment of and Assdet Database system. Thats 100's of operations per second from file formats like EXR multiband, to database read/write IOs, to many many bucket renders from Redshift, onto Premier caching and simultaneous work flows and sharing, and all at 2K/4k.....

(4) We have tried recent R&D on the systems and cluster nodes. Specific to this thread, if we tried to force in a set of high end RAID controllers, the shit goes to shit, instantly. There is something pure and unique about a proper zpool strategy.

We have recently decided to R&D over clocking into the mix, as the MOBOs we use are specifically tuned for this and our machine rooms have 6 tons of AC.

Jorg

PS_ for more please text me at 310-951-7331 and I am cincluding my responses to this thread. I will call back.


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 19, 2016, 5:50 p.m.
+1Yeah,also very curious on the setup@ that pricepoint

On Mon, Sep 19, 2016 at 4:39 PM, David Leach <dleach@wetafx.co.nz> wrote:

I'm also super curious about this setup. If you're successfully serving out 30TB of pure flash with the connectivity and hardware you specified for a cost of $10k I am genuinely interested.

  • You call it a cluster of machines are you clustering them in some way or are they each serving their own nfs export that you have to manage what data goes where manually?
  • Why do you have a graphics card in a storage node?
  • Can you confirm peak throughput numbers in production? We've found that even with a large number of clients aggressive concurrent access (when you have a decent number of nodes) is not that common.
  • What kind of connectivity do the clients have?




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Jorg-Ulrich Mohnen <content@studiosysadmins.com>
Sent: Tuesday, 20 September 2016 1:27 a.m.
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Fileserver SSD Cache

At the Jim Henson Studios, we deploy home build Storage (stratoflash we call it) as 30TB storage nodes that zpool 2TB Samsung EVO Pro drives, a set in each storage node. We do Caching & ZIL'ing on a mair of zpool mirror'd ioFX cards on the PCI-E. Its 100% pure dlash. The servers are 10x in count, totalling a 1/4 Pedabyte of 100% pure and VERY fast storage. Our BOM pricing is around 10,000 per storage node. SO for $100,000 Grande we have a quarter of a pedabyte of most likely the fastest storage in the west.

Absolutely Beautiful.....Started a 15TB rsync over 40G from serverA to ServerG on Friday. This COMPLETED Saturdy evening, and I ran the du -kh . command on well over 1 million prodction files, and the command completed in 12.78 seconds.

It is simplicity at its most purest form. A MoBo, a 1300W power supply, 128GB DDR3 RAM (error checking off in Bios), an INtel 4940K CPU, two ioFX cards with Micron Memory for Zpool Zilling (logging and caching on mirrored 456GB ioFX cards on PCI-E x16), a Skyhawk 40G network card, a small VGA/DVI Radeon graphics card, and ICY Box housing 12x Samsung 2TB 840 EVO PRO drives as well as a 2x ICY Box boot housing 2x 128GB Samsung 840 EVO PRO set of drives.

One power cable and one SFP rat tail hangs off the back of one of these cluster nodes. Also simplicity at its most purest form.

Here it is; how can you tell if you have really fast and responsive storage? Forget about all the other rules as they do not apply.

(1) You run a du -kh . command on the root storage volume on say 18TB's of data totalling about 2 million files = = = command completes in under 20 seconds.

(2) You can effortlessly move TB's of data every evening to and from other storage nodes, and nearline storage, at a speed that is absolutely pure flash, and blazingly fast well above 2GB/second.

(3) You can attached 20x of the fastest rendernodes in the west, attach another 50x or so workstations in full 2D and 3D produciton, attach 2x high end color correction systems, AND attached up to 10x edit bays doing 2K/4K work simultaneously, and this little SLTS THANG handles the shit effortlessly, with no hickups WHAT-SO-EVER, and for over 18 months now.

I've invited ALL to come see the solution we have architected, and no one came. I guess everyone is either happy with their latest SSD technology they purchased for likely a whole lot more, or are doing other things. Refardless the solution we have architected for all this GPY rendering, editorial, Databases accesses (PostGres), et cetera has held for over 18 months and through several productions.

Not a single issue.

Jorg (stratoflash.com among other things.......)


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from William Sandler @ Sept. 19, 2016, 3 p.m.
Jorg, I like your self built solution, and we do something similar, but I'm confused about a couple things you said probably due to me misreading. My questions also assume you are using ZFS because of your use of terms like zpool, zil, etc.
1. With 12x 2TB drives, how are you achieving 30TB? 2. Assuming you meant 24TB, do you have one big striped VDEV per pool?3. Once again, assuming ZFS, why non-ECC RAM?4. Are they EVO or Pro drives? I don't think there is such a thing as EVO Pro.5. The 840 didn't come in a 2TB version.6. What network protocol are you using? SMB or NFS?7. How do the drives connect to the system? LSI card?8. WhatQSFP+ switch do you recommend?



William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Mon, Sep 19, 2016 at 2:32 PM, Saham Ali <saham.ali@gmail.com> wrote:

@JorgHit me up off list.
On Mon, Sep 19, 2016 at 9:27 AM, Jorg-Ulrich Mohnen <content@studiosysadmins.com> wrote:

At the Jim Henson Studios, we deploy home build Storage (stratoflash we call it) as 30TB storage nodes that zpool 2TB Samsung EVO Pro drives, a set in each storage node. We do Caching & ZIL'ing on a mair of zpool mirror'd ioFX cards on the PCI-E. Its 100% pure dlash. The servers are 10x in count, totalling a 1/4 Pedabyte of 100% pure and VERY fast storage. Our BOM pricing is around 10,000 per storage node. SO for $100,000 Grande we have a quarter of a pedabyte of most likely the fastest storage in the west.

Absolutely Beautiful.....Started a 15TB rsync over 40G from serverA to ServerG on Friday. This COMPLETED Saturdy evening, and I ran the du -kh . command on well over 1 million prodction files, and the command completed in 12.78 seconds.

It is simplicity at its most purest form. A MoBo, a 1300W power supply, 128GB DDR3 RAM (error checking off in Bios), an INtel 4940K CPU, two ioFX cards with Micron Memory for Zpool Zilling (logging and caching on mirrored 456GB ioFX cards on PCI-E x16), a Skyhawk 40G network card, a small VGA/DVI Radeon graphics card, and ICY Box housing 12x Samsung 2TB 840 EVO PRO drives as well as a 2x ICY Box boot housing 2x 128GB Samsung 840 EVO PRO set of drives.

One power cable and one SFP rat tail hangs off the back of one of these cluster nodes. Also simplicity at its most purest form.

Here it is; how can you tell if you have really fast and responsive storage? Forget about all the other rules as they do not apply.

(1) You run a du -kh . command on the root storage volume on say 18TB's of data totalling about 2 million files = = = command completes in under 20 seconds.

(2) You can effortlessly move TB's of data every evening to and from other storage nodes, and nearline storage, at a speed that is absolutely pure flash, and blazingly fast well above 2GB/second.

(3) You can attached 20x of the fastest rendernodes in the west, attach another 50x or so workstations in full 2D and 3D produciton, attach 2x high end color correction systems, AND attached up to 10x edit bays doing 2K/4K work simultaneously, and this little SLTS THANG handles the shit effortlessly, with no hickups WHAT-SO-EVER, and for over 18 months now.

I've invited ALL to come see the solution we have architected, and no one came. I guess everyone is either happy with their latest SSD technology they purchased for likely a whole lot more, or are doing other things. Refardless the solution we have architected for all this GPY rendering, editorial, Databases accesses (PostGres), et cetera has held for over 18 months and through several productions.

Not a single issue.

Jorg (stratoflash.com among other things.......)


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 19, 2016, 2:35 p.m.

@JorgHit me up off list.
On Mon, Sep 19, 2016 at 9:27 AM, Jorg-Ulrich Mohnen <content@studiosysadmins.com> wrote:

At the Jim Henson Studios, we deploy home build Storage (stratoflash we call it) as 30TB storage nodes that zpool 2TB Samsung EVO Pro drives, a set in each storage node. We do Caching & ZIL'ing on a mair of zpool mirror'd ioFX cards on the PCI-E. Its 100% pure dlash. The servers are 10x in count, totalling a 1/4 Pedabyte of 100% pure and VERY fast storage. Our BOM pricing is around 10,000 per storage node. SO for $100,000 Grande we have a quarter of a pedabyte of most likely the fastest storage in the west.

Absolutely Beautiful.....Started a 15TB rsync over 40G from serverA to ServerG on Friday. This COMPLETED Saturdy evening, and I ran the du -kh . command on well over 1 million prodction files, and the command completed in 12.78 seconds.

It is simplicity at its most purest form. A MoBo, a 1300W power supply, 128GB DDR3 RAM (error checking off in Bios), an INtel 4940K CPU, two ioFX cards with Micron Memory for Zpool Zilling (logging and caching on mirrored 456GB ioFX cards on PCI-E x16), a Skyhawk 40G network card, a small VGA/DVI Radeon graphics card, and ICY Box housing 12x Samsung 2TB 840 EVO PRO drives as well as a 2x ICY Box boot housing 2x 128GB Samsung 840 EVO PRO set of drives.

One power cable and one SFP rat tail hangs off the back of one of these cluster nodes. Also simplicity at its most purest form.

Here it is; how can you tell if you have really fast and responsive storage? Forget about all the other rules as they do not apply.

(1) You run a du -kh . command on the root storage volume on say 18TB's of data totalling about 2 million files = = = command completes in under 20 seconds.

(2) You can effortlessly move TB's of data every evening to and from other storage nodes, and nearline storage, at a speed that is absolutely pure flash, and blazingly fast well above 2GB/second.

(3) You can attached 20x of the fastest rendernodes in the west, attach another 50x or so workstations in full 2D and 3D produciton, attach 2x high end color correction systems, AND attached up to 10x edit bays doing 2K/4K work simultaneously, and this little SLTS THANG handles the shit effortlessly, with no hickups WHAT-SO-EVER, and for over 18 months now.

I've invited ALL to come see the solution we have architected, and no one came. I guess everyone is either happy with their latest SSD technology they purchased for likely a whole lot more, or are doing other things. Refardless the solution we have architected for all this GPY rendering, editorial, Databases accesses (PostGres), et cetera has held for over 18 months and through several productions.

Not a single issue.

Jorg (stratoflash.com among other things.......)


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from Anthony Gelatka @ Sept. 19, 2016, 10:35 a.m.
There's always throttling...http://docs.thinkboxsoftware.com/products/deadline/8.0/1_User%20Manual/manual/repository-config.html#throttling
I think this was what Peter was referring to, I don't think it works per job though.



On 19 September 2016 at 14:56, Saham Ali <saham.ali@gmail.com> wrote:
@Todd,
The issue we have currently is the READs are killing the performance on the server, we are writing the Writes to the local node, and then copying back to the server after. Too many large files needing to be read by nodes, scaled by 40 nodes. and then artists trying to use that same server to continue comping.
Really am considering just building a massive Flash storage box, and creating a symlink to the outputs directory for the comp and CG outputs.
On Fri, Sep 16, 2016 at 11:46 AM, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 19, 2016, 10 a.m.
@Todd,
The issue we have currently is the READs are killing the performance on the server, we are writing the Writes to the local node, and then copying back to the server after. Too many large files needing to be read by nodes, scaled by 40 nodes. and then artists trying to use that same server to continue comping.
Really am considering just building a massive Flash storage box, and creating a symlink to the outputs directory for the comp and CG outputs.
On Fri, Sep 16, 2016 at 11:46 AM, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from Saker Klippsten @ Sept. 19, 2016, 9:45 a.m.
I was only talking about the writes. In response to Todds comment.  We do not copy over data locally first to the render node. . 
So typically most people set their write node paths to the file server. We set our write path local to the render node. Then copy when done via post render command . So there are no network overhead writing 150MB EXR files. With. A huge load of reads. We forget the resources needed to manage writes with all those open files. This can also impact server resources as its trying to feed writes primarily.  
Sent from my iPhone
On Sep 19, 2016, at 6:31 AM, Saham Ali <saham.ali@gmail.com> wrote:

@SakerThe problem is, with the nuke scripts we are running, the read nodes amount to nearly 14GB in EXR's (on a good day) for 1 frame some sequences 10K frames or more, to have that much data have to get copied or cached over to the node itself, I cant imagine it making copies along with 40 other machines helping too much, unless you feel that the file copy itself will gain much more performance as opposed to having the application read directly to the server.  The performance gains can be that great?


On Fri, Sep 16, 2016 at 12:38 PM, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
         ,\|//,
        ( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

  .ooO
  (    )     Ooo.
--\  ( ------(    )-------------------
    \_)       )  /
              (_/
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 19, 2016, 9:35 a.m.
@SakerThe problem is, with the nuke scripts we are running, the read nodes amount to nearly 14GB in EXR's (on a good day) for 1 frame some sequences 10K frames or more, to have that much data have to get copied or cached over to the node itself, I cant imagine it making copies along with 40 other machines helping too much, unless you feel that the file copy itself will gain much more performance as opposed to having the application read directly to the server. The performance gains can be that great?


On Fri, Sep 16, 2016 at 12:38 PM, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote. We do this for all jobs types... Nuke .. Vray.. Houdini...
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.
Doing a file copy is much more efficient and clean. This reduces the write load and other various locks and process that the OS/filesystem server has to manage. This frees up resources for much better read performance by the clients.


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from Fabien ILLIDE @ Sept. 19, 2016, 8:29 a.m.

Hi !

If you want to lower latency AND have a great Read/Write cache on SSD, for all your NFS/S3 storages (on-premise or in-cloud), have a look at Avere FXT (5400 or 5600 for SSD)

It can greatly improve your workflow, without changing the storage servers, and without changing path too !

It is 3 nodes cluster (1 can fail/reboot/upgrade without interrupt), that you place in front of all your actual servers

Can handle a lot of requests and users, and help you migrate to the cloud too !

We use it for several clients

Get in touch for more details if you want !

Fabien

-- 
Fabien ILLIDE
IMAGE IN NETWORK S.A.S.
6 rue Bichat 75010 PARIS
Tel Bureau  : 0033 (0) 184 177 744
Site web : https://www.imageinnetwork.fr/
Cloud Rendering : http://imagein.cloud/ Boutique : https://shop.imageinnetwork.fr/ Twitter : https://twitter.com/ImageInNetwork Linkedin : https://linkedin.com/company/image-in-network

0 Plus One's     0 Comments  
   

Response from Brian Krusic @ Sept. 16, 2016, 7 p.m.
I cannot emphasize enough what a fantastic technology Alacritech is/was.
Too bad they went bust, my wife did a photoshoot for there next press junket similar to the one I was on holding my surf board.
Oh well, shoulda known any place that thought I was cool was gonna go bust :)
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 3:19 PM, Saham Ali <saham.ali@gmail.com> wrote:
Alacritech  is dead?


On Fri, Sep 16, 2016 at 5:52 PM, Brian Krusic <brian@krusic.com> wrote:
Thats an interesting point.
The L2ARC is not really a read cache though, but an extension of ARC which gets flushed often.
A read cache as what we are thinking is Avere, defunct Alacritech etc
I think the Avere has a definite use case in cloud rendering for sure.  But with the advent of HUUUUGE cheap SSDs coming on line that rival spinning rust in terms of size, Id think there LAN implantations are going to go by the waste side.
Now for WAN, yep, a requirement for cloud rendering.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:49 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Yea I believe it's an 1/8th of the L2ARC size is carved out of ARC.  It's frustrating that SSD read cache is useless on ZFS except for a handful of scenarios.
Link to arcstat.pl in case anyone wants to report back successful uses of L2ARC and share their tunables. https://github.com/mharsch/arcstat    




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 3:37 PM, Brian Krusic <brian@krusic.com> wrote:
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm?  Our hit ratio on L2ARC is generally below 10%.  RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
         ,\|//,
        ( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

  .ooO
  (    )     Ooo.
--\  ( ------(    )-------------------
    \_)       )  /
              (_/ To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Brian Krusic @ Sept. 16, 2016, 6:50 p.m.
Well, unfortunately yea.
Weve 2 of there units, fantastic and FAST.
I think I can keep em going if the mobo dies as well, got some plans.
No need to move off em yet, been going strong for ~3 years now?

- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 3:19 PM, Saham Ali <saham.ali@gmail.com> wrote:
Alacritech  is dead?


On Fri, Sep 16, 2016 at 5:52 PM, Brian Krusic <brian@krusic.com> wrote:
Thats an interesting point.
The L2ARC is not really a read cache though, but an extension of ARC which gets flushed often.
A read cache as what we are thinking is Avere, defunct Alacritech etc
I think the Avere has a definite use case in cloud rendering for sure.  But with the advent of HUUUUGE cheap SSDs coming on line that rival spinning rust in terms of size, Id think there LAN implantations are going to go by the waste side.
Now for WAN, yep, a requirement for cloud rendering.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:49 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Yea I believe it's an 1/8th of the L2ARC size is carved out of ARC.  It's frustrating that SSD read cache is useless on ZFS except for a handful of scenarios.
Link to arcstat.pl in case anyone wants to report back successful uses of L2ARC and share their tunables. https://github.com/mharsch/arcstat    




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 3:37 PM, Brian Krusic <brian@krusic.com> wrote:
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm?  Our hit ratio on L2ARC is generally below 10%.  RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
         ,\|//,
        ( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

  .ooO
  (    )     Ooo.
--\  ( ------(    )-------------------
    \_)       )  /
              (_/ To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 16, 2016, 6:25 p.m.
Alacritech is dead?


On Fri, Sep 16, 2016 at 5:52 PM, Brian Krusic <brian@krusic.com> wrote:
Thats an interesting point.
The L2ARC is not really a read cache though, but an extension of ARC which gets flushed often.
A read cache as what we are thinking is Avere, defunct Alacritech etc
I think the Avere has a definite use case in cloud rendering for sure. But with the advent of HUUUUGE cheap SSDs coming on line that rival spinning rust in terms of size, Id think there LAN implantations are going to go by the waste side.
Now for WAN, yep, a requirement for cloud rendering.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:49 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Yea I believe it's an 1/8th of the L2ARC size is carved out of ARC. It's frustrating that SSD read cache is useless on ZFS except for a handful of scenarios.
Link to arcstat.pl in case anyone wants to report back successful uses of L2ARC and share their tunables. https://github.com/mharsch/arcstat




William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 3:37 PM, Brian Krusic <brian@krusic.com> wrote:
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm? Our hit ratio on L2ARC is generally below 10%. RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote. We do this for all jobs types... Nuke .. Vray.. Houdini...
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.
Doing a file copy is much more efficient and clean. This reduces the write load and other various locks and process that the OS/filesystem server has to manage. This frees up resources for much better read performance by the clients.


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.


Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from Brian Krusic @ Sept. 16, 2016, 5:55 p.m.
Thats an interesting point.
The L2ARC is not really a read cache though, but an extension of ARC which gets flushed often.
A read cache as what we are thinking is Avere, defunct Alacritech etc
I think the Avere has a definite use case in cloud rendering for sure.  But with the advent of HUUUUGE cheap SSDs coming on line that rival spinning rust in terms of size, Id think there LAN implantations are going to go by the waste side.
Now for WAN, yep, a requirement for cloud rendering.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:49 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Yea I believe it's an 1/8th of the L2ARC size is carved out of ARC.  It's frustrating that SSD read cache is useless on ZFS except for a handful of scenarios.
Link to arcstat.pl in case anyone wants to report back successful uses of L2ARC and share their tunables. https://github.com/mharsch/arcstat    




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 3:37 PM, Brian Krusic <brian@krusic.com> wrote:
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm?  Our hit ratio on L2ARC is generally below 10%.  RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from William Sandler @ Sept. 16, 2016, 3:50 p.m.
Yea I believe it's an 1/8th of the L2ARC size is carved out of ARC. It's frustrating that SSD read cache is useless on ZFS except for a handful of scenarios.
Link to arcstat.pl in case anyone wants to report back successful uses of L2ARC and share their tunables. https://github.com/mharsch/arcstat




William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 3:37 PM, Brian Krusic <brian@krusic.com> wrote:
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm? Our hit ratio on L2ARC is generally below 10%. RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote. We do this for all jobs types... Nuke .. Vray.. Houdini...
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.
Doing a file copy is much more efficient and clean. This reduces the write load and other various locks and process that the OS/filesystem server has to manage. This frees up resources for much better read performance by the clients.


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.


Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Brian Krusic @ Sept. 16, 2016, 3:40 p.m.
L2ARC is pretty useless in our env.
Id ditch it as some ARC is used to manage L2ARC.
- Brian
A good day is when no one shows up... and you don't have to go anywhere."
On Sep 16, 2016, at 12:02 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm?  Our hit ratio on L2ARC is generally below 10%.  RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William Sandler All Things Media, LLCOffice: 201.818.1999 Ex 158.  william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from William Sandler @ Sept. 16, 2016, 3:05 p.m.
For the ZFS users: Is anyone actually hitting L2ARC cache with their render farm? Our hit ratio on L2ARC is generally below 10%. RAM cache (ARC) does very well but SSD cache (L2ARC) barely gets touched.




William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, Sep 16, 2016 at 1:16 PM, Ben De Luca <bdeluca@gmail.com> wrote:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote. We do this for all jobs types... Nuke .. Vray.. Houdini...
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.
Doing a file copy is much more efficient and clean. This reduces the write load and other various locks and process that the OS/filesystem server has to manage. This frees up resources for much better read performance by the clients.


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Ben De Luca @ Sept. 16, 2016, 1:20 p.m.
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-fscache.html

free, but I don't know about stability.
On 16 September 2016 at 19:38, Saker Klippsten <sakerk@gmail.com> wrote:
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote. We do this for all jobs types... Nuke .. Vray.. Houdini...
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load.
Doing a file copy is much more efficient and clean. This reduces the write load and other various locks and process that the OS/filesystem server has to manage. This frees up resources for much better read performance by the clients.


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saker Klippsten @ Sept. 16, 2016, 12:40 p.m.
We write files to the local filesystem (SSD) on the render/ workstations nodes and then upon completion - transfer back to the file server via various file copy methods depending if the destination for those renders are local onsite or remote.  We do this for all jobs types... Nuke .. Vray.. Houdini... 
We do this as well for saving files out of maya or Nuke. Everything saves local and then gets pushed to the file system. Pretty much all applications suck at writing to a file server under load. 
Doing a file copy is much more efficient and clean.  This reduces the write load and other various locks and process that the OS/filesystem server has to manage.  This frees up resources for much better read performance by the clients. 


Sent from my iPhone
On Sep 16, 2016, at 8:46 AM, Todd Smith <todd@sohovfx.com> wrote:

If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments  
   

Response from Anonymous @ Sept. 16, 2016, noon
@Todd - Agreed! Please allow me to re-phrase. I was suggesting this would specifically help reduce Saham's "Maya" load as per his 2nd post. By reducing network filer load from Maya scene file load + asset from renderwall, then I guess, you could say, this might help make more bandwidth available for Nuke. The Nuke problem remains. IIRC, I thought I head somewhere that future Nuke version would be trying to tackle this issue (whatever that means).
On 16 September 2016 at 16:46, Todd Smith <todd@sohovfx.com> wrote:
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect.
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx|99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office:(416) 516-7863fax:(416) 516-9682web:sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Todd Smith @ Sept. 16, 2016, 11:50 a.m.
If the problem he is encountering is fast Nuke writes killing his NAS performance, then client side caching is going to have little to no effect. 
Since each render node is reading a unique set of frames, and successive renders of the same frame will rarely hit the same render node, client side caching will not be viable.Avere will fair slightly better mainly because of delayed write back which can help smooth out performance, and the chances of the data being in the cache already are higher, due to the artist working on the NAS based footage at their desk.


Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Anonymous @ Sept. 16, 2016, 11:30 a.m.
@SahamSorry, totally missed this reply last week. How about this as a $free option. Please do get in touch with Thinkbox support. They have a couple of scripts for Maya to cache locally all/some(filtering system) asset data and dynamically rewire the scene file up, before proceeding with render/sim/bake/script/bifrost/export job. This code will see the light of day for everyone, but we have much bigger plans in this space. Of course, using Avere or PerfAccel will really help here, assuming the budget is available.
On 16 September 2016 at 16:10, Rob Tomson <content@studiosysadmins.com> wrote:

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

Rob


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Rob Tomson @ Sept. 16, 2016, 11:10 a.m.

If you have a Linux farm with local SSDs accessing storage over NFS, I would check out PerfAccel (http://datagres.com/perfaccel_caching.html). It's a persistent client-side NFS cache and works very well to reduce filer load and bring the data closer to the compute nodes.

 

Rob


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Sept. 9, 2016, 5:05 p.m.
Peter,Any Idea on how I would "Stagger" jobs via Deadline?
On Wed, Aug 10, 2016 at 6:26 PM, David Leach <dleach@wetafx.co.nz> wrote:

What we've done when faced with this issue is rather obfuscate the load ratherthan build something super high performance to accommodate.


Certain types of FXrenders here create enormous burstwriteload across all our filer to a point that all other users are affected. (We distribute renderload randomly across all filers).In order to combat this issue so that all FX renderworkload was shifted to a single volume on a single filer (at one point a pair of averes, now an oracle). FX performance although reduced was more consistent and the facility was no longer at a standstill as a result. We thenmovedthe data off to the render filers from this one volume so the results were available to the rest of the users without crushing the filers.


We had another render profile which was similar where transient files were created in quick succession and removed once consumed. We again used averes sizedto take the transient workload to protect the rest of the filers. This didn't work quite as well as we'd underestimated how much of a hammer 500+ rendernodes could be to 4xaveres however the principal works.


If buying morestuff isn't an option, separating load profiles if possible is a good way to provide better consistency for higher priority renders/users. You'll need to figure out where the bottleneck is of course in order to separate effectively and there aresome obvious costs to this strategy. You're not likely to be fully utilizing all your hardware and the management overhead is increased.




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Peter Devlin <peter@axisanimation.com>
Sent: Wednesday, 10 August 2016 11:58 p.m.
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Fileserver SSD Cache

Server 2008 storage and file workload and streaming and SMB and hundred plus Nuke nodes? I feel your pain. I was there some years ago. From low cost POV there are a couple of things you can do that may help a little; I'll look at the render node aspect rather than the user experience.
First off, do you have some kind of performance graphing in place so that you can see the trends (network, disk IO, disk access latency) when jobs kick off? Anything along the lines of Zabbix, Nagios or similar would help with initial diagnosis.
Secondly, you should be able to tweak your render control software to stagger the job startup times. Rather than have 100 nodes startup on a job simultaneously, introduce a 5 second staggered start within the job. It should ease your disk IO gridlock and the timing can be tweaked for best effect.
Thirdly, you should be able to have pre- and post- render script execution for a job? If you do, you can script a job introspection that identifies all required assets for a job and then a single pull process, both pre-render, that preloads an SSD bucket with the assets for the job. The Nuke nodes target the bucket and this prevents 100 nodes all hitting one disk IO bound storage. Post-render script can push the completed frames back. This is effectively 'job-level localization' rather than 'render node level localization'.
Fourth, look carefully at our Nuke render settings. In smaller setups you can have multiple Nuke instances rendering on a single node (Nuke isn't generally CPU bound) but there is a tradeoff as your node numbers increase and eventually you become IO bound. It may be more efficient to have only a single instance per node but with the job packet settings tweaked such that larger sequential frame sequences are processed within the job on the node.
Lastly (and I know this is counter-intuitive)when using IO bound storage siloswe found that with comp work there is a sweet spot in actual numbers of nodes made available for comping. Adding more nodes into a comping pool does not necessarily get your work done faster, it can actually be slower than using less nodes. Think law of diminishing returns for parallelized workflow.
Caveat: Most of this is clunky 20th century sticking plaster for a problem which arguably could / should be solved by 21st century smart software such as OneFS or GPFS, or indeed by applying the money hammer.

--
Thanks,

Peter Devlin
Head of IT Tel+44 (0)141 572 2802




A X I S

Axis Productions Limited

7.1 Skypark 1, Elliot Place

Glasgow, G3 8EP

axisanimation.com

Axis Animation axisanimation.com Animation studio; offers details of services, staff, and contacts. Located in Glasgow.


-----------------------------------------------------------------------

Axis Animation (Axis Productions Ltd)
Registered in Scotland: SC306712

Registered Office: Suite 7-1, The Skypark, 8 Elliot Place, Glasgow G3 8EP

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
,\|//,
( o - )
-oOO--(_)--Ooo-----------------
Saham Ali
Founder/Systems EngineerDvNT Technologies
saham@dvnttechnologies.com
407.729.3584 - Direct

.ooO
( ) Ooo.
--\ ( ------( )-------------------
\_) ) /
(_/

0 Plus One's     0 Comments  
   

Response from Peter Devlin @ Aug. 10, 2016, 8 a.m.

Server 2008 storage and file workload and streaming and SMB and hundred plus Nuke nodes? I feel your pain. I was there some years ago. From low cost POV there are a couple of things you can do that may help a little; I'll look at the render node aspect rather than the user experience.
First off, do you have some kind of performance graphing in place so that you can see the trends (network, disk IO, disk access latency) when jobs kick off? Anything along the lines of Zabbix, Nagios or similar would help with initial diagnosis.
Secondly, you should be able to tweak your render control software to stagger the job startup times. Rather than have 100 nodes startup on a job simultaneously, introduce a 5 second staggered start within the job. It should ease your disk IO gridlock and the timing can be tweaked for best effect.
Thirdly, you should be able to have pre- and post- render script execution for a job? If you do, you can script a job introspection that identifies all required assets for a job and then a single pull process, both pre-render, that preloads an SSD bucket with the assets for the job. The Nuke nodes target the bucket and this prevents 100 nodes all hitting one disk IO bound storage. Post-render script can push the completed frames back. This is effectively 'job-level localization' rather than 'render node level localization'.
Fourth, look carefully at our Nuke render settings. In smaller setups you can have multiple Nuke instances rendering on a single node (Nuke isn't generally CPU bound) but there is a tradeoff as your node numbers increase and eventually you become IO bound. It may be more efficient to have only a single instance per node but with the job packet settings tweaked such that larger sequential frame sequences are processed within the job on the node.
Lastly (and I know this is counter-intuitive)when using IO bound storage siloswe found that with comp work there is a sweet spot in actual numbers of nodes made available for comping. Adding more nodes into a comping pool does not necessarily get your work done faster, it can actually be slower than using less nodes. Think law of diminishing returns for parallelized workflow.
Caveat: Most of this is clunky 20th century sticking plaster for a problem which arguably could / should be solved by 21st century smart software such as OneFS or GPFS, or indeed by applying the money hammer.

--
Thanks,

Peter Devlin
Head of ITTel+44 (0)141 572 2802




A X I S

Axis Productions Limited

7.1 Skypark 1, Elliot Place

Glasgow, G3 8EP

axisanimation.com


-----------------------------------------------------------------------

Axis Animation (Axis Productions Ltd)
Registered in Scotland: SC306712

Registered Office: Suite 7-1, The Skypark, 8 Elliot Place, Glasgow G3 8EP

0 Plus One's     0 Comments  
   

Response from Todd Smith @ Aug. 9, 2016, 3:55 p.m.

I would suggest constraining the number of threads Nuke is using.  For the longest time we didn't allow Nuke to use more than a single thread per machine, because disk IO wouldn't support it.  While this would add render time to comp side, it would vastly ease load on your filer and it's a solution that doesn't cost any money whatsoever.



Todd SmithHead of Information Technology
soho vfx | 99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com


File based access. Maya scenes, Houdini Sims, Nuke Comps, EXR's 10k + sequences  with 8-10K Stereo SBS frame sizes. Stream and a bit of Random. CIFS/SMB2.0  roughly about 200 node farm mix between CPU and GPU rendering, with about 30 usersServers are currently Server 2008 with, multiple heads varying from 12 drive-16 drive arrays. Massive budget contraints. Trying to help the pain with a sub 20k solution.
Mind you this is all inherited, and we are trying to relieve some of the pains that happen when a NUKE job kicks off and I have 120+ nodes all trying to pull 50-500MB EXR's from the same filer. I have helped somewhat by having the comp team use localisation settings in nuke, but even then it can take a very long time to localize the frame range they need to work with. let alone not even having the space to copy the entire sequence over. We have gone over optimization techniques like render regions, using 1/4 and 1/8 , 1/32 quality in the viewer etc. It helps but there just isnt enough DISK IO to keep up currently.
We are experimenting with putting comps on a SSD RAID 24 disks or so and using directory junctions in our content structure putting output and comp directories on the SSD so the Maya and Houdini guys dont suffer when comp sends of a render.


On Tue, Aug 9, 2016 at 1:49 PM, Peter Devlin <peter@axisanimation.com> wrote:
The hive mind tends to produce better results with more info up front. What is your data access  model? Block or file? Stream or random? Underlying network transport protocol (NFS vs SMB vs serial)? Numbers of clients? Client and server types? Budget constraints?
We have been using CacheCade for a while on silo CentOS servers rather than central filestore but it's a very noughties cheap-and-cheerful solution. Works well IF you are file-based as opposed to block-based and IF your cache is sized such that a cache flush cannot be caused by large files e.g. FX sims or stream of 9k frames. When you get away from that pattern you need something else e.g. ZFS via good HBAs for block.
From what I've seen on vendor roadmaps, broadly-speaking the question of SSD-based caching for spinning disk is likely to be moot by this time next year. Cost per GB for SSD and flash is predicted to be so far through the floor that spinning disk seems set to go the way of the dodo. Add smart tiering algorithms that are now available in certain filesystems. Net result is that I'd be reluctant to pony up for anything to do with spinning disk in the next 12 months. That buzzword SDS now seems likely to become reality and make such investments look unwise.

Sit on your hands is never really good advice though. Our primary filestore is Isilon so the bulk of my caching needs are met on the cluster. However I do have a pending project with a work profile that seems likely to overtax my Isilon unless I'm prepared to use S series nodes. Therefore on the "roll your own" front I'm also interested in something that might sit in front of Isilon but work well with it. I've been noodling solutions but most of them, in the short term, look like SSD buckets with script-driven preloading of 'hot' data i.e. not at all satisfactory except for edge cases.
So, what's your requirement?

--
Thanks,

Peter Devlin
Head of ITTel +44 (0)141 572 2802




A X I S

Axis Productions Limited

7.1 Skypark 1, Elliot Place

Glasgow, G3 8EP

 

axisanimation.com


-----------------------------------------------------------------------

Axis Animation (Axis Productions Ltd)
Registered in Scotland: SC306712

Registered Office: Suite 7-1, The Skypark, 8 Elliot Place, Glasgow G3 8EP

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe





To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Saham Ali @ Aug. 9, 2016, 2:05 p.m.

File based access. Maya scenes, Houdini Sims, Nuke Comps, EXR's 10k + sequences with 8-10K Stereo SBS frame sizes. Stream and a bit of Random. CIFS/SMB2.0 roughly about 200 node farm mix between CPU and GPU rendering, with about 30 usersServers are currently Server 2008 with, multiple heads varying from 12 drive-16 drive arrays.Massive budget contraints. Trying to help the pain with a sub 20k solution.
Mind you this is all inherited, and we are trying to relieve some of the pains that happen when a NUKE job kicks off and I have 120+ nodes all trying to pull 50-500MB EXR's from the same filer.I have helped somewhat by having the comp team use localisation settings in nuke, but even then it can take a very long time to localize the frame range they need to work with. let alone not even having the space to copy the entire sequence over. We have gone over optimization techniques like render regions, using 1/4 and 1/8 , 1/32 quality in the viewer etc. It helps but there just isnt enough DISK IO to keep up currently.
We areexperimentingwith putting comps on a SSD RAID 24 disks or so and using directory junctions in our content structure puttingoutput and comp directories on the SSD so the Maya and Houdini guys dont suffer when comp sends of a render.


On Tue, Aug 9, 2016 at 1:49 PM, Peter Devlin <peter@axisanimation.com> wrote:
The hive mind tends to produce better results with more info up front. What is your data access model? Block or file? Stream or random? Underlying network transport protocol (NFS vs SMB vs serial)? Numbers of clients? Client and server types? Budget constraints?
We have been using CacheCade for a while on silo CentOS servers rather than central filestore but it's a very noughties cheap-and-cheerful solution. Works well IF you are file-based as opposed to block-based and IF your cache is sized such that a cache flush cannot be caused by large files e.g. FX sims or stream of 9k frames. When you get away from that pattern you need something else e.g. ZFS via good HBAs for block.
>From what I've seen on vendor roadmaps, broadly-speaking the question of SSD-based caching for spinning disk is likely to be moot by this time next year. Cost per GB for SSD and flash is predicted to be so far through the floor that spinning disk seems set to go the way of the dodo. Add smart tiering algorithms that are now available in certain filesystems. Net result is that I'd be reluctant to pony up for anything to do with spinning disk in the next 12 months. That buzzword SDS now seems likely to become reality and make such investments look unwise.

Sit on your hands is never really good advice though. Our primary filestore is Isilon so the bulk of my caching needs are met on the cluster. However I do have a pending project with a work profile that seems likely to overtax my Isilon unless I'm prepared to use S series nodes. Therefore on the "roll your own" front I'm also interested in something that might sit in front of Isilon but work well with it. I've been noodling solutions but most of them, in the short term, look like SSD buckets with script-driven preloading of 'hot' data i.e. not at all satisfactory except for edge cases.
So, what's your requirement?

--
Thanks,

Peter Devlin
Head of ITTel+44 (0)141 572 2802




A X I S

Axis Productions Limited

7.1 Skypark 1, Elliot Place

Glasgow, G3 8EP

axisanimation.com


-----------------------------------------------------------------------

Axis Animation (Axis Productions Ltd)
Registered in Scotland: SC306712

Registered Office: Suite 7-1, The Skypark, 8 Elliot Place, Glasgow G3 8EP

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe





0 Plus One's     0 Comments