首页 \ 问答 \ 平面文件缓存，零停机时间(Flat file caching with zero downtime)

平面文件缓存，零停机时间(Flat file caching with zero downtime)

 每次将新内容发布到我的网站时，我都会为我网站的前5页重新生成平面文件缓存。 这很好用，大大减少了服务器负载。  
 问题是，每次重新生成文件缓存时，我都会看到流量略有下降，大概是因为浏览网站的~2500人中有一小部分但不可忽略的百分比看到了一半生成的页面。  
 我想知道在php中生成这些缓存页面的更好方法是没有任何用户看到半写页面的风险。  
 编辑：  
 以下是我的.htaccess文件中确定是否加载缓存文件的部分：  
RewriteCond %{HTTP_COOKIE} (user)
RewriteRule (.*)? - [S=3] # Skip the below 2 lines if the above test passes
RewriteRule ^$ app/webroot/cache_static_html/cache_static_popular_results_1.php [L]
RewriteRule ^popular/page:([2-9])$ app/webroot/cache_static_html/cache_static_popular_results_$1.php [L]

Every time new content is posted to my site I regenerate the flat file cache for the first 5 pages of my site. This works great and has greatly reduced server load. 
The problem is that every time the file cache is regenerated I see a slight dip in traffic, presumably because some small, but not negligible, percentage of the ~2500 people browsing the site see half generated pages. 
I'm wondering what a better method for generating these cached pages in php would be without any risk of users seeing half-written pages. 
EDIT: 
Here is the portion of my .htaccess file that determines whether to load a cached file: 
RewriteCond %{HTTP_COOKIE} (user)
RewriteRule (.*)? - [S=3] # Skip the below 2 lines if the above test passes
RewriteRule ^$ app/webroot/cache_static_html/cache_static_popular_results_1.php [L]
RewriteRule ^popular/page:([2-9])$ app/webroot/cache_static_html/cache_static_popular_results_$1.php [L]

原文：https://stackoverflow.com/questions/8335631

更新时间：2022-05-08 12:05

最满意答案

 您在SomeComputationOnVal中执行的计算非常昂贵。 每个线程读取至少1MB的数据，这些数据不在缓存中（或者在一小部分时最好只在小范围内变化，在小范围内变化），这些数据总计大约为16 TB的数据。 即使在高端gpu上，也至少需要2分钟才能运行。 更不用提任何可能减慢速度的东西了。  
 您的函数不会在全局内存中写入任何数据，也没有边界效应。 如果不使用输出，编译器可能会决定优化方法调用。  
 因此，不进行计算的情况二和三非常快。 在gpu内存上写入64 MB，使用coesced线程非常快（毫秒范围）。  
 您可以验证生成的ptx以查看代码是否已优化。 在nvcc中使用--keep选项并搜索ptx文件。 

The computations you perform in the SomeComputationOnVal are extremely expensive. Each thread reads at least 1MB of data which is off cache (or in L2 at best for a small part should k vary in a small range) which totals for your run about 16 TB of data. Even on a high end gpu, it would take about 2 minutes to run, at the minimum. Not to mention everything that could slow this down. 
Your function does not write any data in global memory and has no boundary effect. The compiler may decide to optimize out the method call should you not use the output. 
Hence cases two and three not doing calculation are very fast. Writing 64 MB on gpu memory, with coesced threads is very fast (milliseconds range). 
You can verify the generated ptx to see if code gets optimized out. Use the --keep option in nvcc and search for ptx files.

相关问答

在CUDA中将结构写入相同的全局内存位置是原子的吗？(is it atomic for writing a struct to same global memory location in CUDA?)[2023-08-24]

[这个答案是从应该是答案的评论中复制出来的。] 该位置的最终结果Point是否可能具有线程A的x值和线程B的y值？是。要避免这种情况，您需要将Point写为单个原子值（即，将Point重新解释为double或int64并使用原子集）。 [This answer was copied from a comment that should have been an answer.] Is it possible that the final result Point in that location has ...
CUDA全局内存事务的成本(The cost of CUDA global memory transactions)[2022-04-05]

是的，在缓存模式下，将生成一个128字节的事务（从L1缓存级别看。）在未缓存模式下，将生成四个32字节的事务（从L2缓存级别看 - 它仍然是来自的单个128byte请求由于合并而产生的扭曲。）在您描述的情况下，对于完全合并的访问，无论缓存模式还是未缓存模式，四个32字节事务都不会慢。在任何一种情况下，内存控制器（在给定的GPU上）都应生成相同的事务以满足warp的请求。由于内存控制器由多个（最多6个）“分区”组成，每个分区都有64位宽的路径，因此最终可能会使用多个内存事务（可能跨越多个分区）来满足任一请 ...
CUDA中全局内存和纹理之间有什么区别？(What's the difference between global memory and texture in CUDA?)[2022-01-31]

纹理存储器称为映射到全局存储器的硬件单元。在主机内存和GPU内存之间执行复制总是在涉及全局内存的情况下进行，如果纹理单元是否映射到该条全局内存上则无关紧要。您可以在CUDA编程指南中阅读有关纹理内存的更多信息双边过滤样本使用纹理单元通过利用纹理单元缓存机制来增加内存吞吐量。使用纹理内存的好处：启用缓存全局内存缓存数据以最大化2D空间局部性的能力硬件中的线性插值处理硬件中的越界地址 Texture memory is referred to a hardware unit that maps ...
使用cuda直接进行全局内存访问(direct global memory access using cuda)[2023-08-29]

q1-假设我已经使用cudaMemCpyAsync通过stream1将一个数组复制到设备上; 我可以在不同的流中访问该数组的值2吗？是的，可以在您显示的两个内核中访问数组da 。但是，一个重要的问题是先前的cudaMemcpyAsync操作是否完成（或保证是完整的）： cudaMemcpyAsync(da,a,10*sizeof(float),cudaMemcpyHostToDevice,stream[0]); kernel<<>>(da); kernel<<
CUDA Compute Capability 2.0。(CUDA Compute Capability 2.0. Global memory access pattern)[2022-06-23]

L2缓存在某些方面有所帮助，但它并不排除对全局内存的合并访问的需要。简而言之，合并访问意味着对于给定的读（或写）指令，warp中的各个线程正在读取（或写入）全局存储器中相邻的连续位置，最好是在128字节边界上作为一组对齐。这将最有效地利用可用内存带宽。在实践中，这通常不难实现。例如： int idx=threadIdx.x + (blockDim.x * blockIdx.x); int mylocal = global_array[idx]; 假设在全局内存中使用cudaMalloc以普通方式分 ...
Heisenbug在CUDA内核中，全局内存访问(Heisenbug in CUDA kernel, global memory access)[2023-11-27]

请注意，在整篇文章中，我没有看到明确询问的问题，因此我回复：我期待着了解这里发生的一切。你在d_u上有一个竞赛条件。通过你自己的陈述： •为了使块彼此独立，引入网格上的小重叠（每个网格的网格点666,667,668,669由来自不同块的两个线程读取，尽管只有一个线程正在写入他们，这是发生问题的重叠）此外，如果您根据代码中的声明将注释写出给d_u ，问题就会消失。 CUDA线程块可以按任何顺序执行。您至少有两个不同的块正在从网格点666,667,668,669读取。结果将根据实际发生的情况而有所不同 ...
CUDA：存储在数组中的全局内存的地址(CUDA: addresses to global memory stored in array)[2023-08-07]

由于WIDTH不能被BLK_SIZE你必须在代码中插入一个if，以排除超出范围的indeces： __global__ void testKernel (unsigned int *res, const unsigned int *data, unsigned int *pos) { int idx = blockIdx.x*blockDim.x + threadIdx.x; if (idx < WIDTH) { ... } } 事实上， idx在你的内核中从0到5 ...
在CUDA中访问全局内存很慢(Accessing global memory in CUDA is slow)[2023-06-07]

您在SomeComputationOnVal中执行的计算非常昂贵。每个线程读取至少1MB的数据，这些数据不在缓存中（或者在一小部分时最好只在小范围内变化，在小范围内变化），这些数据总计大约为16 TB的数据。即使在高端gpu上，也至少需要2分钟才能运行。更不用提任何可能减慢速度的东西了。您的函数不会在全局内存中写入任何数据，也没有边界效应。如果不使用输出，编译器可能会决定优化方法调用。因此，不进行计算的情况二和三非常快。在gpu内存上写入64 MB，使用coesced线程非常快（毫秒范围）。 ...
哪个在CUDA，全局内存或主机内存中更快？(Which is faster in CUDA, global memory or host memory?)[2022-03-08]

我觉得你会误读一些事情。是的，它说GPU上的单线程代码通常比CPU慢。但这并不是因为原始内存带宽 - 这是因为在运行单个线程时CPU比GPU强大得多。例如，CPU具有流水线和复杂的分支预测，用于从内存预加载数据，而GPU则用于在等待数据时将上下文切换到另一个线程。 CPU针对单线程情况进行了调整，而GPU针对许多线程进行了调整。如果你想知道哪个内存最快，请查看你的卡和主板的技术规格，但这不是本书所讨论的内容。 i think you're misreading things slightly. ye ...
调用多个内核，全局内存性能 - CUDA(Calling multiple kernels, global memory performances - CUDA)[2022-06-01]

如果我理解正确，你会问你是否应该将三个“multiplybyElement”内核合并为一个内核，其中每个内核读取整个（不同的）矩阵，将每个元素乘以常量，并存储新的缩放矩阵。鉴于这些内核将受内存带宽约束（实际上没有计算，只是每个元素的一个乘法），除非你的矩阵很小，否则合并内核不太可能带来任何好处，在这种情况下你将无法有效地使用GPU因为内核将串行执行（相同的流）。 If I understood correctly, you're asking if you should merge the three " ...

用‘button’跟‘text’组合代替‘file’，选择文件后点‘submit’，‘file’的值被清空

Java 流(Stream)、文件(File)和IO

PHP 中dirname(_file_)

xxx is not in the sudoers file解决方法

【HDFS】HADOOP DISTRIBUTED FILE SYSTEM

shell 脚本执行，出现错误bad interpreter: No such file or directory

Hadoop HDFS Wrong FS: hdfs:/ expected file:///

使用solr报错,错误信息 include(SolrClient.php): failed to open stream: No such file or directory

My W3C Custom Mapping File

file_get_contents 无法读取https的问题解决！

平面文件缓存，零停机时间(Flat file caching with zero downtime)

最满意答案

相关问答

在CUDA中将结构写入相同的全局内存位置是原子的吗？(is it atomic for writing a struct to same global memory location in CUDA?)[2023-08-24]

CUDA全局内存事务的成本(The cost of CUDA global memory transactions)[2022-04-05]

CUDA中全局内存和纹理之间有什么区别？(What's the difference between global memory and texture in CUDA?)[2022-01-31]

使用cuda直接进行全局内存访问(direct global memory access using cuda)[2023-08-29]

CUDA Compute Capability 2.0。(CUDA Compute Capability 2.0. Global memory access pattern)[2022-06-23]

Heisenbug在CUDA内核中，全局内存访问(Heisenbug in CUDA kernel, global memory access)[2023-11-27]

CUDA：存储在数组中的全局内存的地址(CUDA: addresses to global memory stored in array)[2023-08-07]

在CUDA中访问全局内存很慢(Accessing global memory in CUDA is slow)[2023-06-07]

哪个在CUDA，全局内存或主机内存中更快？(Which is faster in CUDA, global memory or host memory?)[2022-03-08]

调用多个内核，全局内存性能 - CUDA(Calling multiple kernels, global memory performances - CUDA)[2022-06-01]

相关文章

最新问答